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SYNTHETIC NUCLEIC ACID MOLECULE AND METHODS OF 

PREPARATION 

Background 

5 Transcription, the synthesis of an RNA molecule from a sequence of 

DNA is the first step in gene expression. Sequences which regulate DNA 
transcription include promoter sequences, polyadenylation signals, transcription 
factor binding sites and enhancer elements. A promoter is a DNA sequence 
capable of specific initiation of transcription and consists of three general 

1 0 regions. The core promoter is the sequence where the RNA polymerase and its 
cofactors bind to the DNA. Immediately upstream of the core promoter is the 
proximal promoter which contains several transcription factor binding sites that 
are responsible for the assembly of an activation complex that in turn recruits the 
polymerase complex. The distal promoter, located further upstream of the 

1 5 proximal promoter also contains transcription factor binding sites. Transcription 
termination and polyadenylation, like transcription initiation, are site specific and 
encoded by defined sequences. Enhancers are regulatory regions, containing 
multiple transcription factor binding sites, that can significantly increase the 
level of transcription from a responsive promoter regardless of the enhancer's 

20 orientation and distance with respect to the promoter as long as the enhancer and 
promoter are located within the same DNA molecule. The amount of transcript 
produced from a gene may also be regulated by a post-transcriptional 
mechanism, the most important being RNA splicing that removes intervening 
sequences (introns) from a primary transcript between splice donor and splice 

25 acceptor sequences. 

Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
success of individuals and therefore to modification of the gene pool of a 
population. Some properties of nucleic acid molecules that are acted upon by 

30 natural selection include codon usage frequency, RNA secondary structure, the 
efficiency of intron splicing, and interactions with transcription factors or other 
nucleic acid binding proteins. Because of the degenerate nature of the genetic 
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code, these properties can be optimized by natural selection without altering the 
corresponding amino acid sequence. 

Under some conditions, it is useful to synthetically alter the natural 
nucleotide sequence encoding a polypeptide to better adapt the polypeptide for 
5 alternative applications. A common example is to alter the codon usage 
frequency of a gene when it is expressed in a foreign host cell. Although 
redundancy in the genetic code allows amino acids to be encoded by multiple 
codons, different organisms favor some codons over others. It has been found 
that the efficiency of protein translation in a non-native host cell can be 

1 0 substantially increased by adjusting the codon usage frequency but maintaining 
the same gene product (U.S. Patent Nos. 5,096,825, 5,670,356, and 5,874,304). 

However, altering codon usage may, in turn, result in the unintentional 
introduction into a synthetic nucleic acid molecule of inappropriate transcription 
regulatory sequences. This may adversely effect transcription, resulting in 

1 5 anomalous expression of the synthetic DNA. Anomalous expression is defined 
as departure from normal or expected levels of expression. For example, 
transcription factor binding sites located downstream from a promoter have been 
demonstrated to effect promoter activity (Michael et al., 1990; Lamb et al., 1998; 
Johnson et al, 1998; Jones et al., 1997). Additionally,, it is not uncommon for 

20 an enhancer element to exert activity and result in elevated levels of DNA 
transcription in the absence of a promoter sequence or for the presence of 
transcription regulatory sequences to increase the basal levels of gene expression 
in the absence of a promoter sequence. 

Thus, what is needed is a method for making synthetic nucleic acid 

25 molecules with altered codon usage without also introducing inappropriate or 
unintended transcription regulatory sequences for expression in a particular host 
cell. 

Summary of the Invention 

30 The invention provides an isolated nucleic acid molecule (a 

polynucleotide) comprising a synthetic nucleotide sequence having reduced, for 
instance, 90% or less, e.g., 80%, 78%, 75%, or 70% or less, nucleic acid 
sequence identity relative to a parent nucleic acid sequence, e.g., a wild-type 

2 
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nucleic acid sequence, and having fewer regulatory sequences such as 
transcription regulatory sequences. In one embodiment, the synthetic nucleotide 
sequence has fewer regulatory sequences than would result if the sequence 
differences between the synthetic nucleotide sequence and the parent nucleic 
5 acid sequence, e.g., optionally the result of differing codons, were randomly 
selected. In one embodiment, the synthetic nucleotide sequence encodes a 
polypeptide that has an amino acid sequence that is at least 85%, 90%, 95%, or 
99%, or 100%, identical to the amino acid sequence of a naturally-occumng 
(native or wild-type) corresponding polypeptide (protein). Thus, it is recognized 
10 that some specific amino acid changes may also be desirable to alter a particular 
phenotypic characteristic of a polypeptide encoded by the synthetic nucleotide 
sequence. Preferably, the amino acid sequence identity is over at least 100 
contiguous amino acid residues. In one embodiment of the invention, the codorxs 
in the synthetic nucleotide sequence that differ preferably encode the same amin_o 
15 acids as the corresponding codons in the parent nucleic acid sequence. 

Hence, in one embodiment, the invention provides an isolated nucleic 
acid molecule comprising a synthetic nucleotide sequence having a coding 
region for a selectable or screenable polypeptide, wherein the synthetic 
nucleotide sequence has 90%, e.g., 80%, or less nucleic acid sequence identity to 
20 a parent nucleic acid sequence encoding a corresponding selectable or screenable 
polypeptide, and wherein the synthetic nucleotide sequence encodes a selectable 
or screenable polypeptide with at least 85% amino acid sequence identity to the 
corresponding selectable or screenable polypeptide encoded by the parent nucleic 
acid sequence. The decreased nucleotide sequence identity may be a result of 
25 different codons in the synthetic nucleotide sequence relative to the codons in the 
parent nucleic acid sequence. The synthetic nucleotide sequence of the invention 
has a reduced number of regulatory sequences relative to the parent nucleic acid 
sequence, for example, relative to the average number of regulatory sequences 
resulting from random selections of codons or nucleotides at the sequences 
30 which differ between the synthetic nucleotide sequence and the parent nucleic 
acid sequence. In one embodiment, a nucleic acid molecule may include a 
synthetic nucleotide sequence which together with other sequences encodes a 
selectable or screenable polypeptide. For instance, a synthetic nucleotide 
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sequence which forms part of an open reading frame for a selectable or 
screenable polypeptide may include at least 100, 150, 200, 250, 300 or more 
nucleotides of the open reading, which nucleotides have reduced nucleic acid 
sequence identity relative to corresponding sequences in a parent nucleic acid 
5 sequence. In one embodiment, the parent nucleic acid sequence is SEQ ID 
NO:l, SEQ ID NO:6, SEQ ID NO: 15 or SEQ ID NO:41, the complement 
thereof, or a sequence that has 90%, 95% or 99% nucleic acid sequence identity 
thereto. 

In one embodiment, the nucleic acid molecule of the invention comprises 
1 0 sequences which have been optimized for expression in mammalian cells, and 
more preferably, in human cells (see, e.g., WO 02/16944 which discloses 
methods to optimize sequences for expression in a cell of interest). For instance, 
nucleic acid molecules may be optimized for expression in eukaryotic cells by 
introducing a Kozak sequence and/or one or more introns or decreasing the 

1 5 number of other regulatory sequences, and/or altering codon usage to codons 
employed more frequently in one or more eukaryotic organisms, e.g., codons 
employed more frequently in an eukaryotic host cell to be transformed with the 
nucleic acid molecule. 

In one embodiment, the synthetic nucleotide sequence is present in a 

20 vector, e.g., a plasmid, and such a vector may include other optimized sequences. 
In one embodiment, the synthetic nucleotide sequence encodes a polypeptide 
comprising a selectable polypeptide, which synthetic nucleotide sequence has at 
least 90% or more nucleic acid sequence identity to an open reading frame in a 
sequence comprising, for example, SEQ ID NO:5, SEQ ID NO:9, SEQ ID 

25 NO: 10, SEQ ID NO:ll, SEQ ID NO:30, SEQ ID NO:38, SEQ ID NO:39, SEQ 
ID NO:42, SEQ ID NO:44, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, 
SEQ ID NO:73, SEQ ID NO:74, SEQ ID NO:80, SEQ ID NO:81, SEQ ID 
NO:82, SEQ ID NO:83, SEQ ID NO:84, the complement thereof, or a fragment 
thereof that encodes a polypeptide with substantially the same activity as the 

30 corresponding full-length and optionally wild-type (functional) polypeptide, e.g., 
a polypeptide encoded by SEQ ID NO: 1, SEQ ID NO:6, SEQ ID NO:15 or SEQ 
ID NO:41, or a portion thereof which together with other parent or wild-type 
sequences encodes a polypeptide with substantially the same activity as the 
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corresponding full-length and optionally wild-type polypeptide. As used herein, 
"substantially the same activity" is at least about 70%, e.g., 80%, 90% or more, 
the activity of a corresponding full-length and optionally wild-type (functional) 
polypeptide. In one embodiment, an isolated nucleic acid molecule encodes a 
5 fusion polypeptide comprising a selectable polypeptide. 

Also provided is an isolated nucleic acid molecule comprising a synthetic 
nucleotide sequence having a coding region for a firefly luciferase, wherein the 
nucleic acid sequence identity of the synthetic nucleic acid molecule is 90% or 
less, e.g., 80%, 78%, 75% or less, compared to a parent nucleic acid sequence 
1 0 encoding a firefly luciferase, e.g., a parent nucleic acid sequence having SEQ ID 
NO:14 or SEQ ID NO:43, which synthetic nucleotide sequence has fewer 
regulatory sequences including transcription regulatory sequences than would 
result if the sequence differences, e.g., differing codons, were randomly selected. 
Preferably, the synthetic nucleotide sequence encodes a polypeptide that has an 

15 amino acid sequence that is at least 85%, preferably 90%, and most preferably 
95% or 99% identical to the amino acid sequence of a naturally-occurring or 
parent polypeptide. Thus, it is recognized that some specific amino acid changes 
maybe desirable to alter a particular phenotypic characteristic of the luciferase 
encoded by the synthetic nucleotide sequence. Preferably, the amino acid 

20 sequence identity is over at least 100 contiguous amino acid residues. In one 
embodiment, the synthetic nucleotide sequence encodes a polypeptide 
comprising a firefly luciferase, which synthetic nucleotide sequence has at least 
90% or more nucleic acid sequence identity to an open reading frame in a 
sequence comprising, for example, SEQ ID NO:21, SEQ ID NO:22, SEQ ID 

25 NO:23, the complement thereof, or a fragment thereof that encodes a polypeptide 
with substantially the same activity as the corresponding full-length and 
optionally wild-type (functional) polypeptide, e.g., a polypeptide encoded by 
SEQ ID NO: 14 or SEQ ID NO:43, or a portion thereof which together with other 
sequences encodes a firefly luciferase. For instance, a synthetic nucleotide 

30 sequence which forms part of an open reading frame for a firefly luciferase may 
include at least 100, 150, 200, 250, 300 or more nucleotides of the open reading, 
which nucleotides have reduced nucleic acid sequence identity relative to 
corresponding sequences in a parent nucleic acid sequence. 



WO 2006/034061 



PCTYUS2005/033218 



In another embodiment, the invention provides an isolated nucleic acid 
molecule comprising a synthetic nucleotide sequence which does not include an 
open reading frame encoding a peptide or polypeptide of interest, e.g., the 
synthetic nucleotide sequence may have an open reading frame but it does not 
5 include sequences that encode a functional or desirable peptide or polypeptide, 
but may include one or more stop codons in one or more reading frames, one or 
more poly(A) adenylation sites, and/or a contiguous sequence for two or more 
restriction endonucleases (restriction enzymes), i.e., a multiple cloning region 
(also referred to as a multiple cloning site, "MCS"), and which is generally at 
10 least 20, e.g., at least 30, nucleotides in length and up to 1000 or more 

nucleotides, e.g., up to 10,000 nucleotides, which synthetic nucleotide sequence 
has fewer regulatory sequences such as transcription regulatory sequences 
relative to a corresponding parent nucleic acid sequence. In one embodiment, 
the synthetic nucleotide sequence which does not encode a peptide or 

15 polypeptide has 90% or less, e.g., 80%, or less nucleic acid sequence identity to a 
parent nucleic acid sequence, wherein the decreased sequence identity is a result 
of a reduced number of regulatory sequences in the synthetic nucleotide 
sequence relative to the parent nucleic acid sequence. 

The regulatory sequences which are reduced in the synthetic nucleotide 

20 sequence include, but are not limited to, any combination of transcription factor 
binding sequences, intron splice sites, poly(A) adenylation sites (poly(A) 
sequences or poly(A) sites hereinafter), enhancer sequences, promoter modules, 
and/or promoter sequences, e.g., prokaryotic promoter sequences. Generally, a 
synthetic nucleic acid molecule lacks at least 10%, 20%, 50% or more of the 

25 regulatory sequences, for instance lacks substantially all of the regulatory 

sequences, e.g., 80%, 90% or more, for instance, 95% or more, of the regulatory 
sequences, present in a corresponding parent or wild-type nucleotide sequence. 
Regulatory sequences, e.g., transcription regulatory sequences, are well known in 
the art. The synthetic nucleotide sequence may also have a reduced number of 

30 restriction enzyme recognition sites, and may be modified to include selected 
sequences, e.g., sequences at or near the 5' and/or 3' ends of the synthetic 
nucleotide sequence such as Kozak sequences and/or desirable restriction 
enzyme recognition sites, for instance, restriction enzyme recognition sites useful 
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to introduce a synthetic nucleotide sequence to a specified location, e.g., in a 
multiple cloning region 5 f and/or 3' to a nucleic acid sequence of interest. 

In one embodiment, the synthetic nucleotide sequence of the invention 
has a codon composition that differs from that of the parent or wild-type nucleic 
5 acid sequence. Preferred codons for use in the invention are those which are 
employed more frequently than at least one other codon for the same amino acid 
in a particular organism and/or those that are not low-usage codons in that 
organism and/or those that are not low-usage codons in the organism used to 
clone or screen for the expression of the synthetic nucleotide sequence (for 
10 example, E. coli). Moreover, codons for certain amino acids (i.e., those amino 
acids that have three or more codons), may include two or more codons that are 
employed more frequently than the other (non-preferred) codon(s). The presence 
of codons in a synthetic nucleotide sequence that are employed more frequently 
in one organism than in another organism results in a synthetic nucleotide 
1 5 sequence which, when introduced into the cells of the organism that employs 

those codons more frequently, has a reduced risk of aberrant expression and/or is 
expressed in those cells at a level that may be greater than the expression of the 
wild type (unmodified) nucleic acid sequence in those cells under some 
conditions. For example, a synthetic nucleic acid molecule of the invention 
20 which encodes a selectable or screenable polypeptide may be expressed at a level 
that is greater, e.g., at least about 2, 3, 4, 5, 10-fold or more relative to that of the 
parent or wild-type (unmodified) nucleic acid sequence in a cell or cell extract 
under identical conditions (such as cell culture conditions, vector backbone, and 
the like). In one embodiment, the synthetic nucleotide sequence of the invention 
25 has a codon composition that differs from that of the parent or wild-type nucleic 
acid sequence at more than 10%, 20% or more, e.g., 30%, 35%, 40% or more 
than 45%, e.g., 50%, 55%, 60% or more of the codons. 

In one embodiment of the invention, the codons that are different are 
those employed more frequently in a mammal, while in another embodiment the 
30 codons that are different are those employed more frequently in a plant. A 
particular type of mammal, e.g., human, may have a different set of preferred 
codons than another type of mammal. Likewise, a particular type of plant may 
have a different set of preferred codons than another type of plant. In one 
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embodiment of the invention, the majority of the codons which differ are ones 
that are preferred codons in a desired host cell and/or are not low usage codons 
in a particular host cell. Preferred codons for mammals (e.g., humans) and plants 
are known to the art (e.g., Wada et al., 1990). For example, preferred human 
5 codons include, but are not limited to, CGC (Arg), CTG (Leu), AGC (Ser), ACC 
(Thr), CCC (Pro), GCC (Ala), GGC (Gly), GTG (Val), ACT (lie), AAG (Lys), 
AAC (Asn), CAG (Gin), CAC (His), GAG (Glu), GAC (Asp), TAC (Tyr), TGC 
(Cys) and TTC (Phe) (Wada et al., 1990). Thus, synthetic nucleotide sequences 
of the invention have a codon composition which differs from a wild type 
1 0 nucleic acid sequence by having an increased number of preferred human 
codons, e.g. CGC, CTG, TCT, AGC, ACC, CCC, GCC, GGC, GTG, ACT, 
AAG, AAC, CAG, CAC, GAG, GAC, TAC, TGC, TTC, or any combination 
thereof. For example, the synthetic nucleotide sequence of the invention may 
have an increased number of AGC serine-encoding codons, CCC proline- 
1 5 encoding codons, and/or ACC threonine-encoding codons, or any combination 
thereof, relative to the parent or wild-type nucleic acid sequence. Similarly, 
synthetic nucleotide sequences having an increased number of codons that are 
employed more frequently in plants, have a codon composition which differs 
from a wild-type nucleic acid sequence by having an increased number of the 
20 plant codons including, but not limited to, CGC (Arg), CTT (Leu), TCT (Ser), 
TCC (Ser), ACC (Thr), CCA (Pro), CCT (Pro), GCT (Ser), GGA (Gly), GTG 
(Val), ATC (He), ATT (lie), AAG (Lys), AAC (Asn), CAA (Gin), CAC (His), 
GAG (Glu), GAC (Asp), TAC (Tyr), TGC (Cys), TTC (Phe), or any combination 
thereof (Murray et al., 1989). Preferred codons may differ for different types of 
25 plants (Wada et al, 1990). 

The nucleotide substitutions in the synthetic nucleic acid sequence may 
be influenced by many factors such as, for example, the desire to have an 
increased number of nucleotide substitutions such as those resulting in a silent 
nucleotide substitution (encodes the same amino acid) and/or decreased number 
30 of regulatory sequences. Under some circumstances (e.g., to permit removal of a 
transcription factor binding site) it may be desirable to replace a non-preferred 
codon with a codon other than a preferred codon or a codon other than the 
preferred codon in order to decrease the number of regulatory sequences. 
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The invention also provides an expression cassette or vector. The 
expression cassette or vector of the invention comprises a synthetic nucleotide 
sequence of the invention operatively linked to a promoter that is functional in a 
cell or comprises a synthetic nucleotide sequence, respectively. Preferred 
5 promoters are those functional in mammalian cells and those functional in plant 
cells. Optionally, the expression cassette may include other sequences, e.g., one 
or more restriction enzyme recognition sequences 5' and/or 3' to an open reading 
frame for a selectable polypeptide or luciferase and/or a Kozak sequence, and be 
a part of a larger polynucleotide molecule such as a plasmid, cosmid, artificial 
10 chromosome or vector, e.g., a viral vector, which may include a multiple cloning 
region for other sequences, e.g., promoters, enhancers, other open reading frames 
and/or poly(A) sites. In one embodiment, a vector of the invention includes SEQ 
ID NO:88, SEQ ID NO:89, SEQ ID NO:90, the complement thereof, or a 
sequence which has at least 80% nucleic acid sequence identity thereto and 
1 5 encodes a selectable and/or screenable polypeptide. 

In one embodiment, the synthetic nucleotide sequence encoding a 
selectable or screenable polypeptide is introduced into a vector backbone, e.g., 
one which optionally has a poly(A) site 3' to the synthetic nucleotide sequence, a 
gene useful for selecting transformed prokaryotic cells which optionally is a 
20 synthetic sequence, a gene useful for selecting transformed eukaryotic cells 
which optionally is a synthetic sequence, a noncoding region for decreasing 
transcription and/or translation into adjacent linked desirable open reading 
frames, and/or a multiple cloning region 5 ' and/or 3 ' to the synthetic nucleotide 
sequence encoding a selectable or screenable polypeptide which optionally 
25 includes one or more protein destabilization sequences (see U.S. application 
Serial No. 10/664,341, filed September 16, 2003, the disclosure of which is 
incorporated by reference herein). In one embodiment, the vector having a 
synthetic nucleotide sequence encoding a selectable or screenable polypeptide 
may lack a promoter and/or enhancer which is operably linked to that synthetic 
30 sequence. In another embodiment, the invention provides a vector comprising a 
promoter, e.g., a prokaryotic or eukaryotic promoter, operably linked to a 
synthetic nucleotide sequence encoding a selectable or screenable polypeptide. 
Such vectors optionally include one or more multiple cloning regions, such as 
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ones that are useful to introduce an additional open reading frame and/or a 
promoter for expression of the open reading frame which promoter optionally is 
different than the promoter for the selectable or screenable polypeptide, and/or a 
prokaryotic origin of replication. A 'Vector backbone" as used herein may 
5 include sequences (open reading frames) useful to identify cells with those 
sequences, e.g., in prokaryotic cells, their promoters, an origin of replication for 
vector maintenance, e.g., in prokaryotic cells, and optionally one or more other 
sequences including multiple cloning regions e.g., for insertion of a promoter 
and/or open reading frame of interest, and sequences which inhibit transcription 
1 0 and/or translation. 

Also provided is a host cell comprising the synthetic nucleotide sequence 
of the invention, an isolated polypeptide (e.g., a fusion polypeptide encoded by 
the synthetic nucleotide sequence of the invention), and compositions and kits 
comprising the synthetic nucleotide sequence of the invention, a polypeptide 
1 5 encoded thereby, or an expression cassette or vector comprising the synthetic 
nucleotide sequence in suitable container means and, optionally, instruction 
means. The host cell may be an eukaryotic cell such as a plant or vertebrate cell, 
e.g., a mammalian cell, including but not limited to a human, non-human 
primate, canine, feline, bovine, equine, ovine or rodent (e.g., rabbit, rat, ferret, 
20 hamster, or mouse) cell or a prokaryotic cell. 

The invention also provides a method to prepare a synthetic nucleotide 
sequence of the invention by genetically altering a parent, e.g., a wild-type or 
synthetic, nucleic acid sequence. The method comprises altering (e.g., 
decreasing or eliminating) a plurality of regulatory sequences in a parent nucleic 
25 acid sequence, e.g., one which encodes a selectable or screenable polypeptide or 
one which does not encode a peptide or polypeptide, to yield a synthetic 
nucleotide sequence which has a decreased number of regulatory sequences and, 
if the synthetic nucleotide sequence encodes a polypeptide, it preferably encodes 
the same amino acids as the parent nucleic acid molecule. The transcription 
30 regulatory sequences which are reduced include but are not limited to any of 

transcription factor binding sequences, intron splice sites, poly(A) sites, enhancer 
sequences, promoter modules, and/or promoter sequences. Preferably, the 
alteration of sequences in the synthetic nucleotide sequence does not result in an 
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increase in regulatory sequences. In one embodiment, the synthetic nucleotide 
sequence encodes a polypeptide that has at least 85%, 90%, 95% or 99%, or 
100%, contiguous amino acid sequence identity to the amino acid sequence of 
the polypeptide encoded by the parent nucleic acid sequence. 
5 Thus, in one embodiment, a method to prepare a synthetic nucleic acid 

molecule comprising an open reading frame is provided. The method includes 
altering the codons and/or regulatory sequences in a parent nucleic acid sequence 
which encodes a reporter protein such, as a firefly luciferase or a selectable 
polypeptide such as one encoding resistance to ampicillin, puromycin, 
1 0 hygromycin or neomycin, to yield a synthetic nucleotide sequence which encodes 
a corresponding reporter polypeptide and which has for instance at least 10% or 
more, e.g., 20%, 30%, 40%, 50% or more, fewer regulatory sequences relative to 
the parent nucleic acid sequence. The synthetic nucleotide sequence has 90%, 
e.g., 85%, 80%, or 78%, or less nucleic acid sequence identity to the parent 
15 nucleic acid sequence and encodes a polypeptide with at least 85% amino acid 
sequence identity to the polypeptide encoded by the parent nucleic acid 
sequence. The regulatory sequences which are altered include transcription 
factor binding sequences, intron splice sites, poly(A) sites, promoter modules, 
and/or promoter sequences. In one embodiment, the synthetic nucleic acid 
20 sequence hybridizes under medium stringency hybridization but not stringent 

conditions to the parent nucleic acid sequence or the complement thereof. In one 
embodiment, the codons which differ encode the same amino acids as the 
corresponding codons in the parent nucleic acid sequence. 

Also provided is a synthetic (including a further synthetic) nucleotide 
25 sequence prepared by the methods of the invention, e.g., a further synthetic 
nucleotide sequence in which introduced regulatory sequences or restriction 
endonuclease recognition sequences are optionally removed. Thus, the method 
of the invention may be employed to alter the codon usage frequency and/or 
decrease the number of regulatory sequences in any open reading frame or to 
30 decrease the number of regulatory sequences in any nucleic acid sequence, e.g., a 
noncoding sequence. Preferably, the codon usage frequency in a synthetic 
nucleotide sequence which encodes a selectable or screenable polypeptide is 
altered to reflect that of the host organism desired for expression of that 
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nucleotide sequence while also decreasing the number of potential regulatory 
sequences relative to the parent nucleic acid molecule. 

Also provided is a method to prepare a synthetic nucleic acid molecule 
which does not code for a peptide or polypeptide. The method includes 
5 altering the nucleotides in a parent nucleic acid sequence having at least 20 
nucleotides which optionally does not code for a functional or desirable peptide 
or polypeptide and which optionally may include sequences which inhibit 
transcription and/or translation, to yield a synthetic nucleotide sequence which 
does not include an open reading frame encoding a peptide or polypeptide of 
10 interest, e.g., the synthetic nucleotide sequence may have an open reading frame 
but it does not include sequences that encode a functional or desirable peptide or 
polypeptide, but may include one or more stop codons in one or more reading 
frames, one or more poly(A) adenylation sites, and/or a contiguous sequence for 
two or more restriction endonucleases, i.e., a multiple cloning region. The 
15 synthetic nucleotide sequence is generally at least 20, e.g., at least 30, 

nucleotides in length and up to 1000 or more nucleotides, e.g., up to 10,000 
nucleotides, and has fewer regulatory sequences such as transcription regulatory 
sequences relative to a corresponding parent nucleic acid sequence which does 
not code for a peptide or polypeptide, e.g., a parent nucleic acid sequence which 
20 optionally includes sequences which inhibit transcription and/or translation. The 
nucleotides are altered to reduce one or more regulatory sequences, e.g., 
transcription factor binding sequences, intron splice sites, poly(A) sites, enhancer 
sequences, promoter modules, and/or promoter sequences, in the parent nucleic 
acid sequence. 

25 The invention also provides a method to prepare aa expression vector. 

The method includes providing a linearized plasmid having a nucleic molecule 
including a synthetic nucleotide sequence of the invention "which encodes a 
selectable or screenable polypeptide which is flanked at the 5 ' and/or 3 ' end by a 
multiple cloning region. The plasmid is linearized by contacting the plasmid 

30 with at least one restriction endonuclease which cleaves in the multiple cloning 
region. The linearized plasmid and an expression cassette having ends 
compatible with the ends in the linearized plasmid are annealed, yielding an 
expression vector. In one embodiment, the plasmid is linearized by cleavage by 

12 
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at least two restriction endonucleases, only one of which cleaves in the multiple 
cloning region. 

Also provided is a method to clone a promoter or open reading frame. 
The method includes comprising providing a linearized plasmid having a 
5 multiple cloning region and a synthetic sequence of the invention which encodes 
a selectable or screenable polypeptide and/or a synthetic sequence of the 
invention which does not encode a peptide or polypeptide, which is plasmid is 
linearized by contacting the plasmid with at least two restriction endonucleases 
at least one of which cleaves in the multiple cloning region; and annealing the 

1 0 linearized plasmid with DNA having a promoter or an open reading frame with 
ends compatible with the ends of the linearized plasmid. 

Exemplary methods to prepare synthetic sequences for firefly luciferase 
and a number of selectable polypeptide nucleic acid sequences, as well as non- 
coding regions present in a vector backbone, are described hereinbelow. For 

1 5 instance, the methods may produce synthetic selectable polypeptide nucleic acid 
molecules which exhibit similar or significantly enhanced levels of mammalian 
expression without negatively effecting other desirable physical or biochemical 
properties and which were also largely devoid of regulatory elements. 

Clearly, the present invention has applications with many genes and 

20 across many fields of science including, but not limited to, life science research, 
agrigenetics, genetic therapy, developmental science and pharmaceutical 
development. 

Brief Description of the Fieures 

25 Figure 1. Codons and their corresponding amino acids. 

Figure 2. Design scheme for the pGL4 vector. 

Detailed Description of the Invention 

Definitions 

30 The term "nucleic acid molecule" or "nucleic acid sequence" as used 

herein, refers to nucleic acid, DNA or RNA, that comprises noncoding or coding 
sequences. Coding sequences are necessary for the production of a polypeptide 
or protein precursor. The polypeptide can be encoded by a full-length coding 
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sequence or by any portion of the coding sequence, as long as the desired protein 
activity is retained. Noncoding sequences refer to nucleic acids which do not 
code for a polypeptide or protein precursor, and may include regulatory elements 
such as transcription factor binding sites, poly(A) sites, restriction endonuclease 
5 sites, stop codons and/or promoter sequences. 

A "synthetic" nucleic acid sequence is one which is not found in nature, 
i.e., it has been derived using molecular biological, chemical and/or informatic 
techniques. 

A "nucleic acid", as used herein, is a covalently linked sequence of 
10 nucleotides in which the 3' position of the pentose of one nucleotide is joined by 
a phosphodiester group to the 5' position of the pentose of the next, and in which 
the nucleotide residues (bases) are linked in specific sequence, i.e., a linear order 
of nucleotides. A "polynucleotide", as used herein, is a nucleic acid containing a 
sequence that is greater than about 100 nucleotides in length. An 

15 "oligonucleotide" or "primer", as used herein, is a short polynucleotide or a 

portion of a polynucleotide. An oligonucleotide typically contains a sequence of 
about two to about one hundred bases. The word "oligo" is sometimes used in 
place of the word "oligonucleotide". 

Nucleic acid molecules are said to have a M 5'-tenninus" (5' end) and a 

20 M 3'-terminus" (3' end) because nucleic acid phosphodiester linkages occur to the 
5' carbon and 3' carbon of the pentose ring of the substituent mononucleotides. 
The end of a polynucleotide at which a new linkage would be to a 5' carbon is its 
5' terminal nucleotide. The end of a polynucleotide at which a new linkage 
would be to a 3' carbon is its 3' terminal nucleotide. A terminal nucleotide, as 

25 used herein, is the nucleotide at the end position of the 3'- or 5'-terminus. 

DNA molecules are said to have "5' ends" and "3' ends" because 
mononucleotides are reacted to make oligonucleotides in a manner such that the 
5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of 
its neighbor in one direction via a phosphodiester linkage. Therefore, an end of 

30 an oligonucleotides referred to as the "5' end" if its 5' phosphate is not linked to 
the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' 
oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose 
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ring. 

As used herein, a nucleic acid sequence, even if internal to a larger 
oligonucleotide or polynucleotide, also may be said to have 5' and 3' ends. In 
either a linear or circular DNA molecule, discrete elements are referred to as 

5 being "upstream" or 5' of the "downstream" or 3' elements. This terminology 
reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA 
strand. Typically, promoter and enhancer elements that direct transcription of a 
linked gene (e.g., open reading frame or coding region) are generally located 5' 
or upstream of the coding region. However, enhancer elements can exert their 

10 effect even when located 3' of the promoter element and the coding region. 
Transcription termination and polyadenylation signals are located 3' or 
downstream of the coding region. 

The term "codon" as used herein, is a basic genetic coding unit, 
consisting of a sequence of three nucleotides that specify a particular amino acid 

15 to be incorporation into a polypeptide chain, or a start or stop signal. The term 
"coding region" when used in reference to structural genes refers to the 
nucleotide sequences that encode the amino acids found in the nascent 
polypeptide as a result of translation of a mRNA molecule. Typically, the coding 
region is bounded on the 5' side by the nucleotide triplet "ATG" which encodes 

20 the initiator methionine and on the 3' side by a stop codon (e.g., TAA, TAG, 
TGA). In some cases the coding region is also known to initiate by a nucleotide 
triplet "TTG". 

By "protein", "polypeptide" or "peptide" is meant any chain of amino 
acids, regardless of length or post-translational modification (e.g., glycosylation 

25 or phosphorylation). The nucleic acid molecules of the invention may also 
encode a variant of a naturally-occurring protein or a fragment thereof. 
Preferably, such a variant protein has an amino acid sequence that is at least 
85%, preferably 90%, and most preferably 95% or 99% identical to the amino 
acid sequence of the naturally-occurring (native or wild-type) protein from which 

30 it is derived. 

Polypeptide molecules are said to have an "amino terminus" (N-terminus) 
and a "carboxy terminus" (C-terminus) because peptide linkages occur between 
the backbone amino group of a first amino acid residue and the backbone 
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carboxyl group of a second amino acid residue. The terms "N-terminal" and 
"C-terminal" in reference to polypeptide sequences refer to regions of 
polypeptides including portions of the N-terminal and C-terminal regions of the 
polypeptide, respectively. A sequence that includes a portion of the N-terminal 
5 region of a polypeptide includes amino acids predominantly from the N-terminal 
half of the polypeptide chain, but is not limited to such sequences. For example, 
an N-terminal sequence may include an interior portion of the polypeptide 
sequence including bases from both the N-terminal and C-terminal halves of the 
polypeptide. The same applies to C-terminal regions. N-terminal and 

10 C-terminal regions may, but need not, include the amino acid defining the 
ultimate N-terminus and C-terminus of the polypeptide, respectively. 

The term "wild-type" as used herein, refers to a gene or gene product that 
has the characteristics of that gene or gene product isolated from a naturally 
occurring source. A wild-type gene is that which is most frequently observed in 

15 a population and is thus arbitrarily designated the "wild-type" form of the gene. 
In contrast, the term "mutant" refers to a gene or gene product that displays 
modifications in sequence and/or functional properties (i.e.,. altered 
characteristics) when compared to the wild-type gene or gene product. It is noted 
that naturally-occurring mutants can be isolated; these are identified by the fact 

20 that they have altered characteristics when compared to the wild-type gene or 
gene product. 

The term "recombinant protein" or "recombinant polypeptide" as used 
herein refers to a protein molecule expressed from a recombinant DNA 
molecule. In contrast, the term "native protein" is used herein to indicate a 
25 protein isolated from a naturally occurring (i.e., a nonrecombinant) source. 

Molecular biological techniques maybe used to produce a recombinant form of a 
protein with identical properties as compared to the native form of the protein. 

The term "fusion polypeptide" refers to a chimeric protein containing a 
protein of interest (e.g., luciferase) joined to a heterologous sequence (e.g., a 
30 non-luciferase amino acid or protein). 

The terms "cell," "cell line," "host cell," as used herein, are used 
interchangeably, and all such designations include progeny or potential progeny 
of these designations. By "transformed cell" is meant a cell into which (or into 
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an ancestor of which) has been introduced a nucleic acid molecule of the 
invention, e.g., via transient transfection. Optionally, a nucleic acid molecule 
synthetic gene of the invention may be introduced into a suitable cell line so as to 
create a stably-transfected cell line capable of producing the protein or 
5 polypeptide encoded by the synthetic gene. Vectors, cells, and methods for 
constructing such cell lines are well known in the art. The words 
"transformants" or "transformed cells" include the primary transformed cells 
derived from the originally transformed cell without regard to the number of 
transfers. All progeny may not be precisely identical in DNA content, due to 
1 0 deliberate or inadvertent mutations. Nonetheless, mutant progeny that have the 
same functionality as screened for in the originally transformed cell are included 
in the definition of transformants. 

Nucleic acids are known to contain different types of mutations. A 
"point" mutation refers to an alteration in the sequence of a nucleotide at a single 
1 5 base position from the wild type sequence. Mutations may also refer to insertion 
or deletion of one or more bases, so that the nucleic acid sequence differs from 
the wild-type sequence. 

The term "homology" refers to a degree of complementarity between two 
or more sequences. There may be partial homology or complete homology (i.e., 
20 identity). Homology is often measured using sequence analysis software (e.g., 
EMBOSS, the European Molecular Biology Open Software Suite available at 
http://ww.hgmp.mrc.ac.u^ Such software 

matches similar sequences by assigning degrees of homology to various 
substitutions, deletions, insertions, and other modifications. Conservative 
25 substitutions typically include substitutions within the following groups: 
glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, 
asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, 
tyrosine. 

The term "isolated" when used in relation to a nucleic acid, as in "isolated 
30 oligonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence 
that is identified and separated from at least one contaminant with which it is 
ordinarily associated in its source. Thus, an isolated nucleic acid is present in a 
form or setting that is different from that in which it is found in nature. In 
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contrast, non-isolated nucleic acids (e.g., DNA and RNA) are found in the state 
they exist in nature. For example, a given DNA sequence (e.g., a gene) is found 
on the host cell chromosome in proximity to neighboring genes; RNA sequences 
(e.g., a specific mRNA sequence encoding a specific protein), are found in the 
5 cell as a mixture with numerous other mRNAs that encode a multitude of 
proteins. However, isolated nucleic acid includes, by way of example, such 
nucleic acid in cells ordinarily expressing that nucleic acid where the nucleic 
acid is in a chromosomal location different from that of natural cells, or is 
otherwise flanked by a different nucleic acid sequence than that found in nature. 

10 The isolated nucleic acid or oligonucleotide may be present in single-stranded or 
double-stranded form. When an isolated nucleic acid or oligonucleotide is to be 
utilized to express a protein, the oligonucleotide contains at a minimum, the 
sense or coding strand (i.e., the oligonucleotide maybe single-stranded), but may 
contain both the sense and anti-sense strands (i.e., the oligonucleotide may be 

15 double-stranded). 

The term "isolated" when used in relation to a polypeptide, as in "isolated 
protein" or "isolated polypeptide" refers to a polypeptide that is identified and 
separated from at least one contaminant with which it is ordinarily associated in 
its source. Thus, an isolated polypeptide is present in a form or setting that is 
20 different from that in which it is found in nature. In contrast, non-isolated 
polypeptides (e.g., proteins and enzymes) are found in the state they exist in 
nature. 

The term "purified" or "to purify" means the result of any process that 
removes some of a contaminant from the component of interest, such as a protein 
25 or nucleic acid. The percent of a purified component is thereby increased in the 
sample. 

The term "operably linked" as used herein refer to the linkage of nucleic 
acid sequences in such a manner that a nucleic acid molecule capable of 
directing the transcription of a given gene and/or the synthesis of a desired 
30 protein molecule is produced. The term also refers to the linkage of sequences 
encoding amino acids in such a manner that a functional (e.g., enzymatically 
active, capable of binding to a binding partner, capable of inhibiting, etc.) protein 
or polypeptide is produced. 
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The term "recombinant DNA molecule" means a hybrid DNA sequence 
comprising at least two nucleotide sequences not normally found together in 
nature. 

The term "vector" is used in reference to nucleic acid molecules into 

5 which fragments of DNA may be inserted or cloned and can be used to transfer 
DNA segment(s) into a cell and capable of replication in a cell. Vectors may be 
derived from plasmids, bacteriophages, viruses, cosmids, and the like. 

The terms "recombinant vector" and "expression vector" as used herein 
refer to DNA or RNA sequences containing a desired coding sequence and 

1 0 appropriate DNA or RNA sequences necessary for the expression of the operably 
linked coding sequence in a particular host organism. Prokaryotic expression 
vectors include a promoter, a ribosome binding site, an origin of replication for 
autonomous replication in a host cell and possibly other sequences, e.g. an 
optional operator sequence, optional restriction enzyme sites. A promoter is 

1 5 defined as a DNA sequence that directs RNA polymerase to bind to DNA and to 
initiate RNA synthesis. Eukaryotic expression vectors include a promoter, 
optionally a polyadenlyation signal and optionally an enhancer sequence. 

A polynucleotide having a nucleotide sequence encoding a protein or 
polypeptide means a nucleic acid sequence comprising the coding region of a 

20 gene, or in other words the nucleic acid sequence encodes a gene product. The 
coding region may be present in either a cDNA, genomic DNA or RNA form. 
When present in a DNA form, the oligonucleotide may be single-stranded (i.e., 
the sense strand) or double-stranded. Suitable control elements such as 
enhancers/promoters, splice junctions, polyadenylation signals, etc. maybe 

25 placed in close proximity to the coding region of the gene if needed to permit 
proper initiation of transcription and/or correct processing of the primary RNA 
transcript. Alternatively, the coding region utilized in the expression vectors of 
the present invention may contain endogenous enhancers/promoters, splice 
junctions, intervening sequences, polyadenylation signals, etc. In further 

30 embodiments, the coding region may contain a combination of both endogenous 
and exogenous control elements. 

The term "regulatory element" or "regulatory sequence" refers to a 
genetic element or sequence that controls some aspect of the expression of 



WO 2006/034061 



PCT/US2005/033218 



nucleic acid sequence(s). For example, a promoter is a regulatory element that 
facilitates the initiation of transcription of an operably linked coding region. 
Other regulatory elements include, but are not limited to, transcription factor 
binding sites, splicing signals, polyadenylation signals, termination signals and 

5 enhancer elements. 

Transcriptional control signals in eukaryotes comprise "promoter" and 
"enhancer" elements. Promoters and enhancers consist of short arrays of DNA 
sequences that interact specifically with cellular proteins involved in 
transcription. Promoter and enhancer elements have been isolated from a variety 

10 of eukaryotic sources including genes in yeast, insect and mammalian cells. 
Promoter and enhancer elements have also been isolated from viruses and 
analogous control elements, such as promoters, are also found in prokaryotes. 
The selection of a particular promoter and enhancer depends on the cell type 
used to express the protein of interest. Some eukaryotic promoters and 

15 enhancers have a broad host range while others are functional in a limited subset 
of cell types. For example, the SV40 early gene enhancer is very active in a wide 
variety of cell types from many mammalian species and has been widely used for 
the expression of proteins in mammalian cells. Two other examples of 
promoter/enhancer elements active in a broad range of mammalian cell types are 

20 those from the human elongation factor 1 gene (Uetsuki et al., 1989; Kim et al., 
1990; and Mizushima and Nagata, 1990) and the long terminal repeats of the 
Rous sarcoma virus (Gorman et al., 1982); and the human cytomegalovirus 
(Boshart et al., 1985). 

The term "promoter/enhancer" denotes a segment of DNA containing 

25 sequences capable of providing both promoter and enhancer functions (i.e., the 
functions provided by a promoter element and an enhancer element as described 
above). For example, the long terminal repeats of retroviruses contain both 
promoter and enhancer functions. The enhancer/promoter may be "endogenous" 
or "exogenous" or "heterologous." An "endogenous" enhancer/promoter is one 

30 that is naturally linked with a given gene in the genome. An "exogenous" or 

"heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene 
by means of genetic manipulation (i.e., molecular biological techniques) such 
that transcription ojf the gene is directed by the linked enhancer/promoter. 
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The presence of "splicing signals" on an expression vector often results in 
higher levels of expression of the recombinant transcript in eukaryotic host cells. 
Splicing signals mediate the removal of introns from the primary RNA 
transcript and consist of a splice donor and acceptor site (Sambrook et al., 1989). 
5 A commonly used splice donor and acceptor site is the splice junction from the 
16SRNAofSV40. 

Efficient expression of recombinant DNA sequences in eukaryotic cells 
requires expression of signals directing the efficient termination and 
polyadenylation of the resulting transcript. Transcription termination signals are 
10 generally found downstream of the polyadenylation signal and are a few hundred 
nucleotides in length. The term "polyCA) site" or M poly(A) sequence" as used 
herein denotes a DNA sequence which directs both the termination and 
polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the 
recombinant transcript is desirable, as transcripts lacking a poly(A) tail are 
1 5 unstable and are rapidly degraded. The poly(A) signal utilized in an expression 
vector may be "heterologous" or "endogenous." An endogenous poly(A) signal 
is one that is found naturally at the 3' end of the coding region of a given gene in 
the genome. A heterologous poly(A) signal is one which has been isolated from 
one gene and positioned 3' to another gene. A commonly used heterologous 
20 poly(A) signal is the SV40 poly(A) signal. The S V40 poly(A) signal is 
contained on a 237 bp BarnU VBcl I restriction fragment and directs both 
termination and polyadenylation (Sambrook et al., 1989). 

Eukaryotic expression vectors may also contain "viral replicons "or "viral 
origins of replication." Viral replicons are viral DNA sequences which allow for 
25 the extrachromosomal replication of a vector in a host cell expressing the 

appropriate replication factors. Vectors containing either the S V40 or polyoma 
virus origin of replication replicate to high copy number (up to 10 4 copies/cell) in 
cells that express the appropriate viral T antigen. In contrast, vectors containing 
the replicons from bovine papillomavirus or Epstein-Barr virus replicate 
30 extrachromosomally at low copy number (about 100 copies/cell). 

The term "in vitro" refers to an artificial environment and to processes or 
reactions that occur within an artificial environment. In vitro environments 
include, but are not limited to, test tubes and cell lysates. The term "in vivo" 
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refers to the natural environment (e.g., an animal or a cell) and to processes or 
reactions that occur within a natural environment. 

The term "expression system" refers to any assay or system for 
determining (e.g., detecting) the expression of a gene of interest. Those skilled 
5 in the field of molecular biology will understand that any of a wide variety of 
expression systems may be used. A wide range of suitable mammalian cells are 
available from a wide range of sources (e.g., the American Type Culture 
Collection, Rockland, MD). The method of transformation or transfection and 
the choice of expression vehicle will depend on the host system selected. 

10 Transformation and transfection methods are described, e.g., in Ausubel et al., 
1992. Expression systems include in vitro gene expression assays where a gene 
of interest (e.g., a reporter gene) is linked to a regulatory sequence and the 
expression of the gene is monitored following treatment with an agent that 
inhibits or induces expression of the gene. Detection of gene expression can be 

15 through any suitable means including, but not limited to, detection of expressed 
mRNA or protein (e.g., a detectable product of a reporter gene) or through a 
detectable change in the phenotype of a cell expressing the gene of interest. 
Expression systems may also comprise assays where a cleavage event or other 
nucleic acid or cellular change is detected. 

20 All amino acid residues identified herein are in the natural 

L-configuration. In keeping with standard polypeptide nomenclature, 
abbreviations for amino acid residues are as shown in the following Table of 
Correspondence. 

25 TABLE OF CORRESPONDENCE 
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L-leucine 
L-threonine 
L-valine 
L-proline 
L-lysine 
L-histidine 
L-glutamine 
L-glutamic acid 
L-tryptophan 
L-arginine 
L-aspartic acid 
L-asparagine 
L-cysteine 

The terms "complementary" or "complementarity" are used in reference 
15 to a sequence of nucleotides related by the base-pairing rules. For example, for 
the sequence 5' "A-G-T" 3', is complementary to the sequence 3' "T-C-A" 5'. 
Complementarity may be "partial," in which only some of the nucleic acids' 
bases are matched according to the base pairing rules. Or, there may be 
"complete" or "total" complementarity between the nucleic acids. The degree of 
20 complementarity between nucleic acid strands has significant effects on the 

efficiency and strength of hybridization between nucleic acid strands. This is of 
particular importance in amplification reactions, as well as detection methods 
which depend upon hybridization of nucleic acids. 

When used in reference to a double-stranded nucleic acid sequence such 
25 as a cDNA or a genomic clone, the term "substantially homologous" refers to any 
probe which can hybridize to either or both strands of the double-stranded 
nucleic acid sequence under conditions of low stringency as described herein. 

"Probe" refers to an oligonucleotide designed to be sufficiently 
complementary to a sequence in a denatured nucleic acid to be probed (in 
30 relation to its length) and is bound under selected stringency conditions. 

"Hybridization" and "binding" in the context of probes and denatured 
. nucleic acids are used interchangeably. Probes that are hybridized or bound to 
denatured nucleic acids are base paired to complementary sequences in the 

23 
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polynucleotide. Whether or not a particular probe remains base paired with the 
polynucleotide depends on the degree of complementarity, the length of the 
probe, and the stringency of the binding conditions. The higher the stringency, 
the higher must be the degree of complementarity and/or the longer the probe. 

5 The term "hybridization" is used in reference to the pairing of 

complementary nucleic acid strands. Hybridization and the strength of 
hybridization (i.e., the strength of the association between nucleic acid strands) is 
impacted by many factors well known in the art including the degree of 
complementarity between the nucleic acids, stringency of the conditions 

1 0 involved such as the concentration of salts, the Tm (melting temperature) of the 
formed hybrid, the presence of other components (e.g., the presence or absence 
of polyethylene glycol), the molarity of the hybridizing strands and the G:C 
content of the nucleic acid strands. 

The term "stringency" is used in reference to the conditions of 

1 5 temperature, ionic strength, and the presence of other compounds, under which 
nucleic acid hybridizations are conducted. With "high stringency" conditions, 
nucleic acid base pairing will occur only between nucleic acid fragments that 
have a high frequency of complementary base sequences. Thus, conditions of 
"medium" or "low" stringency are often required when it is desired that nucleic 

20 acids that are not completely complementary to one another be hybridized or 
annealed together. The art knows well that numerous equivalent conditions can 
be employed to comprise medium or low stringency conditions. The choice of 
hybridization conditions is generally evident to one skilled in the art and is 
usually guided by the purpose of the hybridization, the type of hybridization 

25 (DNA-DNA or DNA-RNA), and the level of desired relatedness between the 
sequences (e.g., Sambrook et al., 1989; Nucleic Acid Hybridization, A Practical 
Approach, IRL Press, Washington D.C., 1985, for a general discussion of the 
methods). 

The stability of nucleic acid duplexes is known to decrease with 
30 increasing numbers of mismatched bases, and further to be decreased to a greater 
or lesser degree depending on the relative positions of mismatches in the hybrid 
duplexes. Thus, the stringency of hybridization can be used to maximize or 
minimize stability of such duplexes. Hybridization stringency can be altered by: 
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adjusting the temperature of hybridization; adjusting the percentage of helix 
destabilizing agents, such as formamide, in the hybridization mix; and adjusting 
the temperature and/or salt concentration of the wash solutions. For filter 
hybridizations, the final stringency of hybridizations often is determined by the 
5 salt concentration and/or temperature used for the post-hybridization washes. 

"High stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42°C 
in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 
1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardt's 

1 0 reagent and 100 jwg/ml denatured salmon sperm DNA followed by washing in a 
solution comprising 0.1X SSPE, 1.0% SDS at 42°C when a probe of about 500 
nucleotides in length is employed. 

"Medium stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42°C 

15 in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 
1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardt's 
reagent and 100 |ig/ml denatured salmon sperm DNA followed by washing in a 
solution comprising 1.0X SSPE, 1.0% SDS at 42°C when a probe of about 500 
nucleotides in length is employed. 

20 "Low stringency conditions" comprise conditions equivalent to binding 

or hybridization at 42°C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 
g/1 NaH 2 P0 4 H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% 
SDS, 5X Denhardt's reagent [SOX Denhardt's contains per 500 ml: 5 g Ficoll 
(Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured 

25 salmon sperm DNA followed by washing in a solution comprising 5X SSPE, 

0.1% SDS at 42°C when a probe of about 500 nucleotides in length is employed. 

The term "T m " is used in reference to the "melting temperature". The 
melting temperature is the temperature at which 50% of a population of 
double-stranded nucleic acid molecules becomes dissociated into single strands. 

30 The equation for calculating the T m of nucleic acids is well-known in the art. 

The Tm of a hybrid nucleic acid is often estimated using a formula adopted from 
hybridization assays in 1 M salt, and commonly used for calculating Tm for PCR 
primers: [(number of A + T) x 2°C + (number of G+C) x 4°C]. (C.R. Newton et 
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al., PCR, 2nd Ed., Springer-Verlag (New York, 1997), p. 24). This formula was 
found to be inaccurate for primers longer than 20 nucleotides. (Id.) Another 
simple estimate of the T m value may be calculated by the equation: T m = 81 .5 + 
0.41 (% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl. (e.g., 
5 Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid 
Hybridization. 1985). Other more sophisticated computations exist in the art 
which take structural as well as sequence characteristics into account for the 
calculation of T m . A calculated T m is merely an estimate; the optimum 
temperature is commonly determined empirically. 

10 The term "promoter/enhancer" denotes a segment of DNA containing 

sequences capable of providing both promoter and enhancer functions (i.e., the 
functions provided by a promoter element and an enhancer element as described 
above). For example, the long terminal repeats of retroviruses contain both 
promoter and enhancer functions. The enhancer/promoter may be "endogenous" 

15 or "exogenous" or "heterologous." An "endogenous" enhancer/promoter is one 
that is naturally linked with a given gene in the genome. An "exogenous" or 
"heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene 
by means of genetic manipulation (i.e., molecular biological techniques) such 
that transcription of the gene is directed by the linked enhancer/promoter. 

20 The term "sequence homology" means the proportion of base matches 

between two nucleic acid sequences or the proportion of amino acid matches 
between two amino acid sequences. When sequence homology is expressed as a 
percentage, e.g., 50%, the percentage denotes the proportion of matches over the 
length of sequence from one sequence that is compared to some other sequence. 

25 Gaps (in either of the two sequences) are permitted to maximize matching; gap 
lengths of 15 bases or less are usually used, 6 bases or less are preferred with 
2 bases or less more preferred. When using oligonucleotides as probes or 
treatments, the sequence homology between the target nucleic acid and the 
oligonucleotide sequence is generally not less than 17 target base matches out of 

30 20 possible oligonucleotide base pair matches (85%); preferably not less than 9 
matches out of 10 possible base pair matches (90%), and more preferably not 
less than 19 matches out of 20 possible base pair matches (95%). 

26 
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Two amino acid sequences are homologous if there is a partial or 
complete identity between their sequences. For example, 85% homology means 
that 85% of the amino acids are identical when the two sequences are aligned for 
maximum matching. Gaps (in either of the two sequences being matched) are 
5 allowed in maximizing matching; gap lengths of 5 or less are preferred with 2 or 
less being more preferred. Alternatively and preferably, two protein sequences 
(or polypeptide sequences derived from them of at least 100 amino acids in 
length) are homologous, as this term is used herein, if they have an alignment 
score of at more than 5 (in standard deviation units) using the program ALIGN 

10 with the mutation data matrix and a gap penalty of 6 or greater. See Dayhoff, M. 
O., in Atlas of Protein Sequence and Structure, 1972, volume 5, National 
Biomedical Research Foundation, pp. 101-110, and Supplement 2 to this 
volume, pp. 1-10. The two sequences or parts thereof are more preferably 
homologous if their amino acids are greater than or equal to 85% identical when 

1 5 optimally aligned using the ALIGN program. 

The following terms are used to describe the sequence relationships 
between two or more polynucleotides: "reference sequence", "comparison 
window", "sequence identity", "percentage of sequence identity", and 
"substantial identity". A "reference sequence" is a defined sequence used as a 

20 basis for a sequence comparison; a reference sequence may be a subset of a 
larger sequence, for example, as a segment of a full-length cDNA or gene 
sequence given in a sequence listing, or may comprise a complete cDNA or gene 
sequence. Generally, a reference sequence is at least 20 nucleotides in length, 
frequently at least 25 nucleotides in length, and often at least 50 or 100 

25 nucleotides in length. Since two polynucleotides may each (1) comprise a 

sequence (i.e., a portion of the complete polynucleotide sequence) that is similar 
between the two polynucleotides, and (2) may further comprise a sequence that is 
divergent between the two polynucleotides, sequence comparisons between two 
(or more) polynucleotides are typically performed by comparing sequences of the 

30 two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. 

A "comparison window", as used herein, refers to a conceptual segment 
of at least 20 contiguous nucleotides and wherein the portion of the 
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polynucleotide sequence in the comparison window may comprise additions or 
deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence 
(which does not comprise additions or deletions) for optimal alignment of the 
two sequences. 

5 Methods of alignment of sequences for comparison are well known in the 

art. Thus, the determination of percent identity between any two sequences can 
be accomplished using a mathematical algorithm. Preferred, non-limiting 
examples of such mathematical algorithms are the algorithm of Myers and Miller 
(1988); the local homology algorithm of Smith and Waterman (1981); the 

10 homology alignment algorithm of Needleman and Wunsch (1970); the search- 
for-similarity-method of Pearson and Lipman (1988); the algorithm of Karlin and 
Altschul (1990), modified as in Karlin and Altschul (1993). 

Computer implementations of these mathematical algorithms can be 
utilized for comparison of sequences to determine sequence identity. Such 

15 implementations include, but are not limited to: ClustalW (available, e.g., at 
http://www.ebi.ac.uk/clustalw/); the ALIGN program (Version 2.0) and GAP, 
BESTFIT, BLAST, FAST A, and TFASTA in the Wisconsin Genetics Software 
Package, Version 8. Alignments using these programs can be performed using 
the default parameters. The CLUSTAL program is well described by Higgins et 

20 al. (1988); Higgins et al. (1989); Corpet et al. (1988); Huang et al. (1992); and 
Pearson et al. (1994). The ALIGN program is based on the algorithm of Myers 
and Miller, supra. The BLAST programs of Altschul et al. (1990), are based on 
the algorithm of Karlin and Altschul supra. To obtain gapped alignments for 
comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as 

25 described in Altschul et al. (1997). Alternatively, PSI-BLAST (in BLAST 2.0) 
can be used to perform an iterated search that detects distant relationships 
between molecules. See Altschul et al., supra. When utilizing BLAST, Gapped 
BLAST, PSI-BLAST, the default parameters of the respective programs (e.g. 
BLASTN for nucleotide sequences, BLASTX for proteins) can be used. See 

30 http://www.ncbi.nlm.nih.gov. Alignment may also be performed.manually by 
inspection 

The term "sequence identity" means that two polynucleotide sequences 
are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of 
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comparison. The term "percentage of sequence identity" means that two 
polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) 
for the stated proportion of nucleotides over the window of comparison. The 
term "percentage of sequence identity" is calculated by comparing two optimally 
5 aligned sequences over the window of comparison, determining the number of 
positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs 
in both sequences to yield the number of matched positions, dividing the number 
of matched positions by the total number of positions in the window of 
comparison (i.e., the window size), and multiplying the result by 100 to yield the 

1 0 percentage of sequence identity. The terms "substantial identity" as used herein 
denote a characteristic of a polynucleotide sequence, wherein the polynucleotide 
comprises a sequence that has at least 60%, preferably at least 65%, more 
preferably at least 70%, up to about 85%, and even more preferably at least 90 to 
95%, more usually at least 99%, sequence identity as compared to a reference 

15 sequence over a comparison window of at least 20 nucleotide positions, 

frequently over a window of at least 20-50 nucleotides, and preferably at least 
300 nucleotides, wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide sequence which may 
include deletions or additions which total 20 percent or less of the reference 

20 sequence over the window of comparison. The reference sequence may be a 
subset of a larger sequence. 

As applied to polypeptides, the term "substantial identity" means that two 
peptide sequences, when optimally aligned, such as by the programs GAP or 
BESTFIT using default gap weights, share at least about 85% sequence identity, 

25 preferably at least about 90% sequence identity, more preferably at least about 95 
% sequence identity, and most preferably at least about 99 % sequence identity. 

Synthetic Nucleotide Sequences and Methods of the Invention 

The invention provides compositions comprising synthetic nucleotide 
30 sequences, as well as methods for preparing those sequences which yield 

synthetic nucleotide sequences that are efficiently expressed as a polypeptide or 
protein with desirable characteristics including reduced inappropriate or 

29 
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unintended transcription characteristics, or do not result in inappropriate or 
unintended transcription characteristics, when present in a particular cell type. 

Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
5 success of individuals and hence to modification of the gene pool of a 

population. It is generally accepted that the amino acid sequence of a protein 
found in nature has undergone optimization by natural selection. However, 
amino acids exist within the sequence of a protein that do not contribute 
significantly to the activity of the protein and these amino acids can be changed 

10 to other amino acids with little or no consequence. Furthermore, a protein may 
be useful outside its natural environment or for purposes that differ from the 
conditions of its natural selection. In these circumstances, the amino acid 
sequence can be synthetically altered to better adapt the protein for its utility in 
various applications. 

15 Likewise, the nucleic acid sequence that encodes a protein is also 

optimized by natural selection. The relationship between coding DNA and its 
transcribed RNA is such that any change to the DNA affects the resulting RNA. 
Thus, natural selection works on both molecules simultaneously. However, this 
relationship does not exist between nucleic acids and proteins. Because multiple 

20 codons encode the same amino acid, many different nucleotide sequences can 

encode an identical protein. A specific protein composed of 500 amino acids can 
theoretically be encoded by more than 10 150 different nucleic acid sequences. 

Natural selection acts on nucleic acids to achieve proper encoding of the 
corresponding protein. Presumably, other properties of nucleic acid molecules 

25 are also acted upon, by natural selection. These properties include codon usage 
frequency, RNA secondary structure, the efficiency of intron splicing, and 
interactions with transcription factors or other nucleic acid binding proteins. 
These other properties may alter the efficiency of protein translation and the 
resulting phenotype. Because of the redundant nature of the genetic code, these 

30 other attributes can be optimized by natural selection without altering the 
corresponding amixio acid sequence. 

Under some conditions, it is useful to synthetically alter the natural 
nucleotide sequence encoding a protein to better adapt the protein for alternative 
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applications. A common example is to alter the codon usage frequency of a gene 
when it is expressed in a foreign host. Although redundancy in the genetic code 
allows amino acids to be encoded by multiple codons, different organisms favor 
some codons over others. The codon usage frequencies tend to differ most for 
5 organisms with widely separated evolutionary histories. It has been found that 
when transferring genes between evolutionarily distant organisms, the efficiency 
of protein translation can be substantially increased by adjusting the codon usage 
frequency (see U.S. Patent Nos. 5,096,825, 5,670,356 and 5,874,304). 

In one embodiment, the sequence of a reporter gene is modified as the 

10 codon usage of reporter genes often does not correspond to the optimal codon 
usage of the experimental cells. In another embodiment, the sequence of a 
reporter gene is modified to remove regulatory sequences such as those which 
may alter expression of the reporter gene or a linked gene. Examples include P- 
galactosidase ($-gal) and chloramphenicol acetyltransferase (cat) reporter genes 

1 5 that are derived from E. coli and are commonly used in mammalian cells; the p- 
glucuronidase (gus) reporter gene that is derived from E. coli and commonly 
used in plant cells; the firefly lxiciferase (luc) reporter gene that is derived from 
an insect and commonly used in plant and mammalian cells; and the Renilla 
luciferase, and green fluorescent protein (gfp) reporter genes which are derived 

20 from coelenterates and are commonly used in plant and mammalian cells. To 
achieve sensitive quantitation of reporter gene expression, the activity of the 
gene product must not be endogenous to the experimental host cells. Thus, 
reporter genes are usually selected from organisms having unique and distinctive 
phenotypes. Consequently, these organisms often have widely separated 

25 evolutionary histories from the experimental host cells. 

Previously, to create genes having a more optimal codon usage frequency 
but still encoding the same gene product, a synthetic nucleic acid sequence was 
made by replacing existing codons with codons that were generally more 
favorable to the experimental host cell (see U.S. Patent Nos. 5,096,825, 

30 5,670,356 and 5,874,304.) The result was a net improvement in codon usage 
frequency of the synthetic gene. However, the optimization of other attributes 
was not considered and so these synthetic genes likely did not reflect genes 
optimized by natural selection. 
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In particular, improvements in codon ixsage frequency are intended only 
for optimization of a RNA sequence based on its role in translation into a 
protein. Thus, previously described methods did not address how the sequence 
of a synthetic gene affects the role of DNA in transcription into RNA. Most 
5 notably, consideration had not been given as to how transcription factors may 
interact with the synthetic DNA and consequently modulate or otherwise 
influence gene transcription. For genes found in nature, the DNA would be 
optimally transcribed by the native host cell and would yield an RNA that 
encodes a properly folded gene product. In contrast, synthetic genes have 

1 0 previously not been optimized for transcriptional characteristics. Rather, this 
property has been ignored or left to chance. 

This concern is important for all genes, but particularly important for 
reporter genes, which are most commonly used to quantitate transcriptional 
behavior in the experimental host cells, and vector backbone sequences for 

1 5 genes. Hundreds of transcription factors have been identified in different cell 
types under different physiological conditions, and likely more exist but have not 
yet been identified. All of these transcription factors can influence the 
transcription of an introduced gene or sequences linked thereo. A useful 
synthetic reporter gene or vector backbone of the invention has a minimal risk of 

20 influencing or perturbing intrinsic transcriptional characteristics of the host cell 
because the structure of that gene or vector backbone has been altered. A 
particularly useful synthetic reporter gene or vector backbone will have desirable 
characteristics under a new set and/or a wide variety of experimental conditions. 
To best achieve these characteristics, the structure of the synthetic gene or 

25 synthetic vector backbone should have minimal potential for interacting with 
transcription factors within abroad range of host cells and physiological 
conditions. Minimizing potential interactions between a reporter gene or vector 
backbone and a host cell's endogenous transcription factors increases the value 
of a reporter gene or vector backbone by reducing the risk of inappropriate 

30 transcriptional characteristics of the gene or vector backbone within a particular 
experiment, increasing applicability of the gene or vector backbone in various 
environments, and increasing the acceptance of the resulting experimental data. 
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In contrast, a reporter gene comprising a native nucleotide sequence, 
based on a genomic or cDNA clone from the original host organism, or a vector 
backbone comprising native sequences found in one or a variety of different 
organisms, may interact with transcription factors when present in an exogenous 
5 host. This risk stems from two circumstances. First, the native nucleotide 
sequence contains sequences that were optimized through natural selection to 
influence gene transcription within the native host organism. However, these 
sequences might also influence transcription when the sequences are present in 
exogenous hosts, i.e., out of context, thus interfering with its performance as a 

10 reporter gene or vector backbone. Second, the nucleotide sequence may 
inadvertently interact with transcription factors that were not present in the 
native host organism, and thus did not participate in its natural selection. The 
probability of such inadvertent interactions increases with greater evolutionary 
separation between the experimental cells and the native organism of the reporter 

1 5 gene or vector backbone. 

These potential interactions with transcription factors would likely be 
disrupted when using a synthetic reporter gene having alterations in codon usage 
frequency. However, a synthetic reporter gene sequence, designed by choosing 
codons based only on codon usage frequency, or randomly replacing sequences 

20 or randomly juxtaposing sequences in a vector backbone, is likely to contain 
other unintended transcription factor binding sites since the resulting sequence 
has not been subjected to the benefit of natural selection to correct inappropriate 
transcriptional activities. Inadvertent interactions with transcription factors 
could also occur whenever an encoded amino acid sequence is artificially altered, 

25 e.g., to introduce amino acid substitutions. Similarly, these changes have not 
been subjected to natural selection, and thus may exhibit undesired 
characteristics. 

Thus, the invention provides a method for preparing synthetic nucleotide 
sequences that reduce the risk of undesirable interactions of the nucleotide 
30 sequence with transcription factors and other trans-acting factors when expressed 
in a particular host cell, thereby reducing inappropriate or unintended 
characteristics. Preferably, the method yields synthetic genes containing 
improved codon usage frequencies for a particular host cell and with a reduced 
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occurrence of regulatory sequences such as transcription factor binding sites 
and/or vector backbone sequences with a reduced occurrence of regulatory 
sequences. The invention also provides a method of preparing synthetic genes 
containing improved codon usage frequencies with a reduced occurrence of 
5 transcription factor binding sites and additional beneficial structural attributes. 
Such additional attributes include the absence of inappropriate RNA splicing 
junctions, poly(A) addition signals, undesirable restriction enzyme recognition 
sites, ribosomal binding sites, and/or secondary structural motifs such as hairpin 
loops. 

10 In one embodiment, a parent nucleic acid sequence encoding a 

polypeptide is optimized for expression in a particular cell. For example, the 
nucleic acid sequence is optimized by replacing codons in the wild-type 
sequence with codons which are preferentially employed in a particular 
(selected) cell, which codon replacement also reduces the number of regulatory 

1 5 sequences. Preferred codons have a relatively high codon usage frequency in a 
selected cell, and preferably their introduction results in the introduction of 
relatively few regulatory sequences such as transcription factor binding sites, and 
relatively few other undesirable structural attributes. Thus, the optimized 
nucleotide sequence may have an improved level of expression due to improved 

20 codon usage frequency, and a reduced risk of inappropriate transcriptional 
behavior due to a reduced number of undesirable transcription regulatory 
sequences. In another embodiment, a parent vector backbone sequence is altered 
to remove regulatory sequences and optionally restriction endonuclease sites, and 
optionally retain or add other desirable characteristics, e.g., the presence of one 

25 or more stop codons in one or more reading frames, one or more poly(A) sites, 
and/or restriction endonuclease sites. 

The invention may be employed with any nucleic acid sequence, e.g., a 
native sequence such as a cDNA or one that has been manipulated in vitro. 
Exemplary genes include, but are not limited to, those encoding lactamase (p- 

30 gal), neomycin resistance (Neo), hygromycin resistance (Hyg), puromycin 

resistance (Puro), ampicillin resistance (Amp), CAT, GUS, galactopyranoside, 
GFP, xylosidase, thymidine kinase, arabinosidase, luciferase and the like. As 
used herein, a Reporter gene" is a gene that imparts a distinct phenotype to cells 
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expressing the gene and thus permits cells having the gene to be distinguished 
from cells that do not have the gene. Such genes may encode either a selectable 
or screenable polypeptide, depending on whether the marker confers a trait 
which one can 'select' for by chemical means, i.e., through the use of a selective 
5 agent (e.g., a herbicide, antibiotic, or the like), or whether it is simply a 
"reporter" trait that one can identify through observation or testing, i.e., by 
'screening'. Included within the terms selectable or screenable marker genes are 
also genes which encode a "secretable marker" whose secretion can be detected 
as a means of identifying or selecting for transformed cells. Examples include 

1 0 markers that encode a secretable antigen that can be identified by antibody 

interaction, or even secretable enzymes which can be detected by their catalytic 
activity. Secretable proteins fall into a number of classes, including small, 
diffusible proteins detectable, e.g., by ELISA, and proteins that are inserted or 
trapped in the cell membrane. 

15 Elements of the present disclosure are exemplified in detail through the 

use of particular genes and vector backbone sequences. Of course, many 
examples of suitable genes and vector backbones are known to the art and can be 
employed in the practice of the invention. Therefore, it will be understood that 
the following discussion is exemplary rather than exhaustive. In light of the 

20 techniques disclosed herein and the general recombinant techniques that are 

known in the art, the present invention renders possible the alteration of any gene 
or vector backbone sequence. 

Exemplary genes include, but are not limited to, a neo gene, a puro gene, 
an amp gene, a P-gal gene, a gus gene, a cat gene, a gpt gene, a hyg gene, a hisD 

25 gene, a ble gene, a mprt gene, a bar gene, a nitrilase gene, a mutant acetolactate 
synthase gene (ALS) or acetoacid synthase gene (AAS), a methotrexate-resistant 
dhfr gene, a dalapon dehalogenase gene, a mutated anthranilate synthase gene 
that confers resistance to 5-methyl tryptophan (WO 97/26366), an R-locus gene, 
a p-lactamase gene, axy/E gene, an a-amylase gene, a tyrosinase gene, a 

30 luciferase Que) gene (e.g., a Renilla reniformis luciferase gene, a firefly 

luciferase gene, or a click beetle luciferase (Pyrophorus plagiophthalamus gene), 
an aequorin gene, or a fluorescent protein gene. 
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The method of the invention can be performed by, although it is not 
limited to, a recursive process. The process includes assigning preferred codons 
to each amino acid in a target molecule, e.g., a native nucleotide sequence, based 
on codon usage in a particular species, identifying potential transcription 
5 regulatory sequences such as transcription factor binding sites in the nucleic acid 
sequence having preferred codons, e.g., using a database of such binding sites, 
optionally identifying other undesirable sequences, and substituting an 
alternative codon (i.e., encoding the same amino acid) at positions where 
undesirable transcription factor binding sites or other sequences occur. For 

1 0 codon distinct versions, alternative preferred codons are substituted in each 

version. If necessary, the identification and elimination of potential transcription 
factor or other undesirable sequences can be repeated until a nucleotide sequence 
is achieved containing a maximum number of preferred codons and a minimum 
number of undesired sequences including transcription regulatory sequences or 

15 other undesirable sequences. Also, optionally, desired sequences, e.g., restriction 
enzyme recognition sites, can be introduced. After a synthetic nucleotide 
sequence is designed and constructed, its properties relative to the parent nucleic 
acid sequence can be determined by methods well known to the art. For 
example, the expression of the synthetic and target nucleic acids in a series of 

20 vectors in a particular cell can be compared. 

Thus, generally, the method of the invention comprises identifying a 
target nucleic acid sequence, and a host cell of interest, for example, a plant 
(dicot or monocot), fungus, yeast or mammalian cell. Preferred host cells are 
mammalian host cells such as CHO, COS, 293, Hela, CV-1 and N1H3T3 cells. 

25 Based on preferred codon usage in the host cell(s) and, optionally, low codon 
usage in the host cell(s), e.g., high usage mammalian codons and low usage E. 
coli and mammalian codons, codons to be replaced are determined. Concurrent, 
subsequent or prior to selecting codons to be replaced, desired and undesired 
sequences, such as undesired transcriptional regulatory sequences, in the target 

30 sequence are identified. These sequences, including transcriptional regulatory 
sequences and restriction endonuclease sites, can be identified using databases 
and software such as TRANSFAC® (Transcription Factor Database, 
http://www. gene-regulation.com/) . Match™ (http://www.gene-regulation.com/) . 
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Matlnspector (Genomatix, http://ww.genomatix.de) , EPD (Eukaiyotic 
Promoter Database, http://www.epd.isb-sib.ch/) , REBASE® (Restriction Enzyme 
Database, NEB, http://rebase.neb.com) . TESS (Transcription Element Search 
System, http://www.cbil.upenn.edu/tess/) , MAR-Wiz (Futuresoft, 
5 http://www.futuresoft.orgy Lasergene® (DNASTAR, http://www.dnastar.com) . 
Vector NIT™ (Invitrogen, http://www.invitrogen.com) , and Sequence 
Manipulation Suite (http://www.bioinformatics.org/SMS/index.html) . 
Links to other databases and sequence analysis software are listed at 
http://www.expasv.org/alinks.html . After one or more sequences are identified, 

10 the modification(s) may be introduced. Once a desired synthetic nucleotide 

sequence is obtained, it can be prepared by methods well known to the art (such 
as nucleic acid amplification reactions with overlapping primers), and its 
structural and functional properties compared to the target nucleic acid sequence, 
including, but not limited to, percent homology, presence or absence of certain 

15 sequences, for example, restriction sites, percent of codons changed (such as an 
increased or decreased usage of certain codons) and/or expression rates. 

As described below, the method was used to create synthetic reporter 
genes encoding firefly luciferases and selectable polypeptides, and synthetic 
sequences for vector backbones. Synthetic sequences may support greater levels 

20 of expression and/or reduced aberrant expression than the corresponding native 
or parent sequenes for the protein. The native and parent sequences may 
demonstrate anomalous transcription characteristics when expressed in 
mammalian cells, which are likely not evident in the synthetic sequences. 

25 Exemplary Uses of the Synthetic Nucleotide Sequences 

The synthetic genes of the invention preferably encode the same proteins 
as their native counterpart (or nearly so), but have improved codon usage while 
being largely devoid of regulatory elements in the coding (it is recognized that a 
small number of amino acid changes may be desired to enhance a property of the 

30 native counterpart protein, e.g. to enhance luminescence of a luciferase) and 
noncoding regions. This increases the level of expression of the protein the 
synthetic gene encodes and reduces the risk of anomalous expression of the 
protein. For example, studies of many important events of gene regulation, 
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which may be mediated by weak promoters, are limited by insufficient reporter 
signals from inadequate expression of the reporter proteins. Also, the use of 
some selectable markers may be limited by the expression of that marker in an 
exogenous cell. Thus, synthetic selectable marker genes which have improved 
5 codon usage for that cell, and have a decrease in other undesirable sequences, 
(e.g., transcription factor binding sites), can permit the use of those markers in 
cells that otherwise were undesirable as hosts for those markers. 

Promoter crosstalk is another concern when a co-reporter gene is used to 
normalize transfection efficiencies. With the enhanced expression of synthetic 

10 genes, the amount of DNA containing strong promoters can be reduced, or DNA 
containing weaker promoters can be employed, to drive the expression of the co- 
reporter. In addition, there may be a reduction in the background expression 
from the synthetic reporter genes of the invention. This characteristic makes 
synthetic reporter genes more desirable by minimizing the sporadic expression 

15 from the genes and reducing the interference resulting from other regulatory 
pathways. 

The use of reporter genes in imaging systems, which can be used for in 
vivo biological studies or drug screening, is another use for the synthetic genes of 
the invention. Due to their increased level of expression, the protein encoded by 

20 a synthetic gene is more readily detectable by an imaging system. In fact, using a 
synthetic Renilla luciferase gene, luminescence in transfected CHO cells was 
detected visually without the aid of instrumentation. 

In addition, the synthetic genes may be used to express fusion proteins, 
for example fusions with secretion leader sequences or cellular localization 

25 sequences, to study transcription in difficult-to-transfect cells such as primary 
cells, and/or to improve the analysis of regulatory pathways and genetic 
elements. Other uses include, but are not limited to, the detection of rare events 
that require extreme sensitivity (e.g., studying RNA receding), use with IRES, to 
improve the efficiency of in vitro translation or in vitro transcription-translation 

30 coupled systems such as TnT (Promega Corp., Madison, WI), study of reporters 
optimized to different host organisms (e.g., plants, fungus, and the like), use of 
multiple genes as co-reporters to monitor drug toxicity, as reporter molecules in 
multiwell assays, and as reporter molecules in drug screening with the advantage 
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of minimizing possible interference of reporter signal by different signal 
transduction pathways and other regulatory mechanisms. 

Additionally, uses for the synthetic nucleotide sequences of the invention 
include fluorescence activated cell sorting (FACS), fluorescent microscopy, to 
5 detect and/or measure the level of gene expression in vitro and in vivo, (e.g., to 
determine promoter strength), subcellular localization or targeting (fusion 
protein), as a marker, in calibration, in a kit (e.g., for dual assays), for in vivo 
imaging, to analyze regulatory pathways and genetic elements, and in multi-well 
formats. 

10 Further, although reporter genes are widely used to measure transcription 

events, their utility can be limited by the fidelity and efficiency of reporter 
expression. For example, in U.S. Patent No. 5,670,356, a firefly luciferase gene 
(referred to as luc+) was modified to improve the level of luciferase expression. 
While a higher level of expression was observed, it was not determined that 

1 5 higher expression had improved regulatory control. 

The invention will be further described by the following nonlimiting 
examples. In particular, the synthetic nucleic acid molecules of the invention 
may be derived by other methods as well as by variations on the methods 
described herein. 

20 

Example 1 

Synthetic Click Beetle CRD and GR) Luciferase Nucleic Acid Molecules 
LucP/?/YG is a wild-type click beetle luciferase that emits yellow-green 
luminescence (Wood, 1989). A mutant of LucPplYG named YG#81-6G01 was 
25 envisioned. YG#8 1 -6G01 lacks a peroxisome targeting signal, has a lower Km 
for luciferin and ATP, has increased signal stability and increased temperature 
stability when compared to the wild type (PCT/W09914336). YG #81-6G01 
was mutated to emit green luminescence by changing Ala at position 224 to Vai 
(A224V is a green-shifting mutation), or to emit red luminescence by 
30 simultaneously introducing the amino acid substitutions A224H, S247H, N346I, 
and H348Q (red-shifting mutation set) (PCW09518853) 

Using YG #81-6G01 as a parent gene, two synthetic gene sequences were 
designed. One codes for a luciferase emitting green luminescence (GR) and one 

39 
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for a luciferase emitting red luminescence (RD). Both genes were designed to 1) 
have optimized codon usage for expression in mammalian cells, 2) have a 
reduced number of transcriptional regulatory sites including mammalian 
transcription factor binding sites, splice sites, poly(A) sites and promoters, as 
5 well as prokaiyotic (E. coli) regulatory sites, 3) be devoid of unwanted restriction 
sites, e.g., those which are likely to interfere with standard cloning procedures, 
and 4) have a low DNA sequence identity compared to each other in order to 
minimize genetic rearrangements when both are present inside the same cell In 
addition, desired sequences, e.g., a Kozak sequence or restriction enzyme 

10 recognition sites, may be identified and introduced. 

Not all design criteria could be met equally well at the same time. The 
following priority was established for reduction of transcriptional regulatory 
sites: elimination of transcription factor (TF) binding sites received the highest 
priority, followed by elimination of splice sites and poly(A) sites, and finally 

15 prokaryotic regulatory sites. When removing regulatory sites, the strategy was to 
work from the lesser important to the most important to ensure that the most 
important changes were made last. Then the sequence was rechecked for the 
appearance of new lower priority sites and additional changes made as needed. 
Thus, the process for designing the synthetic GR and RD gene sequences, using 

20 computer programs described herein, involved 5 optionally iterative steps that 
are detailed below 

1. Optimized codon usage and changed A224V to create GRverl, 
separately changed A224H, S247H, H348Q and N346I to create 
RDverl. These particular amino acid changes were maintained 

25 throughout all subsequent manipulations to the sequence. 

2. Removed undesired restriction sites, prokaryotic regulatory sites, 
splice sites, poly(A) sites thereby creating GRver2 and RDver2. 

3. Removed transcription factor binding sites (first pass) and removed 
any newly created undesired sites as listed in step 2 above thereby 

30 creatingGRver3 and RDver3. 

4. Removed transcription factor binding sites created by step 3 above 
(second pass) and removed any newly created undesired sites as listed 
in step 2 above thereby creating GRver4 and RDver4. 
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5. Removed transcription factor binding sites created by step 4 above 
(third Pass) and confirmed absence of sites listed in step 2 above 
thereby creating GRverS and RDverS. 

6. Constructed the actual genes by PCR using synthetic oligonucleotides 
5 corresponding to fragments of GRverS and RDverS designed 

sequences thereby creating GR6 and RD7. GR6, upon sequencing 
was found to have the serine residue at amino acid position 49 
mutated to an asparagine and the proline at amino acid position 230 
mutated to a serine (S49N, P230S). RD7, upon sequencing was 
10 found to have the histidine at amino acid position 36 mutated to a 

tyrosine (H36Y). These changes occurred during the PCR process. 

4. The mutations described in step 6 above (S49N, P230S for GR6 and 
H36Y for RD7) were reversed to create GRverS. 1 and RDverS. 1. 

5. RDverS. 1 was further modified by changing the arginine codon at 
15 position 351 to a glycine codon (R351G) thereby creating RDver5.2 

with improved spectral properties compared to RDverS. 1. 

6. RDverS .2 was further mutated to increase luminescence intensity 
thereby creating RD1 56-1H9 which encodes four additional amino 
acid changes (M2I, S349T, K488T, E538V) and three silent single 

20 base changes (see U.S. application Serial No. 09/645,706, filed 

August 24, 2000, the disclosure of which is incorporated by reference 
herein). 

1. Optimize codon usage and introduce mutations determining luminescence 
25 color 

The starting gene sequence for thus design step was YG #81-6G01. 
a) Optimize codon usage: 

The strategy was to adapt the codon usage for optimal expression in 
human cells and at the same time to avoid E. coli low-usage codons. Based on 
30 these requirements, the best two codons for expression in human cells for all 
amino acids with more than two codons were selected (see Wada et al., 1990). 
In the selection of codon pairs for amino acids with six codons, the selection was 
biased towards pairs that have the largest number of mismatched bases to allow 
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design of GR and RD genes with minimum sequence identity (codon 
distinction): 

Arg: CGC/CGT Leu: CTG/TTG Ser: TCT/AGC 

Thr: ACC/ACT Pro: CCA/CCT Ala: GCC/GCT 

5 Gly: GGC/GGT Val: GTC/GTG lie: ATC/ATT 

Based on this selection of codons, two gene sequences encoding the YG#81- 
6G01 luciferase protein sequence were computer generated. The two genes were 
designed to have minimum DNA sequence identity and at the same time closely 
similar codon usage. To achieve this, each codon in the two genes was replaced 
10 by a codon from the limited list described above in an alternating fashion (e.g., 
Arg (n) is CGC in gene 1 and CGT in gene 2, Arg (T1+1 ) is CGT in gene 1 and CGC 
in gene 2). 

For subsequent steps in the design process it was anticipated that changes 
had to be made to this limited optimal codon selection in order to meet other 
1 5 design criteria, however, the following low-usage codons in mammalian cells 
were not used unless needed to meet criteria of higher priority: 

Arg: CGA Leu: CTA Ser: TCG 

Pro: CCG Val: GTA lie: ATA 
Also, the following low-usage codons in E. coli were avoided when reasonable 
20 (note that 3 of these match the low-usage list for mammalian cells): 

Arg: CGA/CGG/AGA/AGG 

Leu: CTA Pro: CCC lie: ATA 
b) Introduce mutations determining luminescence color: 

Into one of the two codon-optimized gene sequences was introduced the 
25 single green-shifting mutation and into the other -were introduced the 4 red- 
shifting mutations as described above. 

The two output sequences from this first design step were named GRverl 
(version 1 GR) and RDverl (version 1 RD). Their DNA sequences are 63% 
identical (594 mismatches), while the proteins they encode differ only by the 4 
30 amino acids that determine luminescence color (see Figures 2 and 3 for an 
alignment of the DNA and protein sequences). 

Tables 1 and 2 show, as an example, the codon usage for valine and 
leucine in human genes, the parent gene YG#81-6G01, the codon-optimized 

42 



WO 2006/034061 



PCT/US2005/033218 



synthetic genes GRverl and RDverl, as well as the final versions of the synthetic 
genes after completion of step 5 in the design process (GRverS and KDverS). 



Table 1: Valine 



Codon 


Human 


Parent 


GRverl 


RDverl 


GTA 


4 


13 


0 


0 


GTC 


13 


4 


25 


24 


GTG 


24 


12 


25 


25 


GTT 


9 


20 


0 


0 



Table 2: Leucine 



Codon 


Human 


Parent 


GR verl 


RDverl 


CTA 


3 


5 


0 


0 


CTC 


12 


4 


0 


1 


CTG 


24 


4 


28 


27 


CTT 


6 


12 


0 


0 


TTA 


3 


17 


0 


0 


TTG 


6 


13 


27 


27 



GRver5 


RD ver5 


1 


1 


21 


26 


25 


17 


3 


5 




GRver5 


RD ver5 


0 


0 


12 


11 


19 


18 


1 


1 


0 


0 


23 


25 



2. Remove undesired restriction sites, prokarvotic regulatory sites, splice sites 
and polvf A) sites 

10 The starting gene sequences for this design step were GRverl and RDverl . 

a) Remove undesired restriction sites: 

To check for the presence and location of undesired restriction sites, the 

sequences of both synthetic genes were compared against a database of 

restriction enzyme recognition sequences (REBASE ver.712, 
15 http://www.neb.com/rebase > ) using standard sequence analysis software (GenePro 

ver 6.10, Riverside Scientific Ent.). 

Specifically, the following restriction enzymes were classified as undesired: 

- BamR I, Xho I, Sfi I, Kpn I, Sac \ Mlu I, Nhe I, Sma I, XTto I, Bgl H, 
Hind m, Nco I, Nar I, Xba I, Hpa I, Sal I, 

20 - other cloning sites commonly used: EcoR I , EcoR V, Cla I, 

- eight-base cutters (commonly used for complex constructs), 

- BsiE II (to allow N-terminal fusions), 

- Xcm I (can generate AJT overhang used for T-vector cloning). 
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To eliminate undesired restriction sites when found in a synthetic gene, one or 
more codons of the synthetic gene sequence were altered in accordance with the 
codon optimization guidelines described in la above, 
b) Remove prokaryotic (E. coli) regulatory sequences: 
5 To check for the presence and location of prokaryotic regulatory 

sequences, the sequences of both synthetic genes were searched for the presence 
of the following consensus sequences using standard sequence analysis software 
(GenePro): 

- TATAAT (-10 Pribnow box of promoter) 

10 - AGGA or GGAG (ribosome binding site; only considered if paired 

with a methionine codon 12 or fewer bases downstream). 
To eliminate such regulatory sequences when found in a synthetic gene, one or 
more codons of the synthetic gene at sequence were altered in accordance with 
the codon optimization guidelines described in la above. 
15 c) Remove splice sites: 

To check for the presence and location of splice sites, the DNA strand 
corresponding to the primary RNA transcript of each synthetic gene was 
searched for the presence of the following consensus sequences (see Watson et 
al., 1983) using standard sequence analysis software (GenePro): 
20 - splice donor site: AG I GTRAGT (exon | intron), the search was 

performed for AGGTRAG and the lower stringency GGTRAGT; 

- splice acceptor site: (Y)nNCAG | G (intron | exon), the search was 
performed with n = 1 . 

To eliminate splice sites found in a synthetic gene, one or more codons of the 
25 synthetic gene sequence were altered in accordance with the codon optimization 
guidelines described in la above. Splice acceptor sites were generally difficult to 
eliminate in one gene without introducing them into the other gene because they 
tended to contain one of the two only Gin codons (CAG); they were removed by 
placing the Gin codon CAA in both genes at the expense of a slightly increased 
30 sequence identity between the two genes. 
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d) Remove poly(A) sites: 

To check for the presence and location of poly(A) sites, the sequences of 
both synthetic genes were searched for the presence of the following consensus 
sequence using standard sequence analysis software (GenePro): 

5 - AATAAA. 

To eliminate each poly(A) addition site found in a synthetic gene, one or more 
codons of the synthetic gene sequence were altered in accordance with the codon 
optimization guidelines described in la above. The two output sequences from 
this second design step were named GRver2 and RDver2. Their DNA sequences 

1 0 are 63% identical (590 mismatches). 

3. Remove transcription factor (TF) binding sites, then repeat steps 2 a-d 
The starting gene sequences for this design step were GRver2 and 
RDver2. 

1 5 To check for the presence, location and identity of potential TF binding sites, the 
sequences of both synthetic genes were used as query sequences to search a 
database of transcription factor binding sites (TRANSFAC v3.2). The 
TRANSFAC database fhttp://transfac.gbf.de/TRANSFAC/index:htmn holds 
information on gene regulatory DNA sequences (TF binding sites) and proteins 

20 (TFs) that bind to and act through them. The SITE table of TRANSFAC Release 
3.2 contains 4,401 entries of individual (putative) TF binding sites (including TF 
binding sites in eukaryotic genes, in artificial sequences resulting from 
mutagenesis studies and in vitro selection procedures based on random 
oligonucleotide mixtures or specific theoretical considerations, and consensus 

25 binding sequences (from Faisst and Meyer, 1 992). 

The software tool used to locate and display these TF binding sites in the 
synthetic gene sequences was TESS (Transcription Element Search Software, 
http ://agave.humgen.upenn. edu/tess/index.html) . The filtered string-based 
search option was used with the following user-defined search parameters: 

30 - Factor Selection Attribute: Organism Classification 

- Search Pattern: Mammalia 

- Max. Allowable Mismatch %: 0 

- Min. element length: 5 

45 



WO 2006/034061 



PCTAJS2005/033218 



- Min. log-likelihood: 10 
This parameter selection specifies that only mammalian TF binding sites 
(approximately 1,400 of the 4,401 entries in the database) that are at least 5 bases 
long will be included in the search. It further specifies that only TF binding sites 

5 that have a perfect match in the query sequence and a minimum log likelihood 
(LLH) score of 10 will be reported. The LLH scoring method assigns 2 to an 
unambiguous match, 1 to a partially ambiguous match (e.g., A or T match W) 
and 0 to a match against C N\ For example, a search with parameters specified 
above would result in a "hif ' (positive result or match) for TATAA (SEQ ID 

10 NO:50) (LLH = 10), STRATG (SEQ ID NO:51) (LLH = 10), and 

MTTNCNNMA (SEQ ID NO:52) (LLH = 10) but not for TRATG (SEQ ID NO: 
53) (LLH = 9) if these four TF binding sites were present in the query sequence. 
A lower stringency test was performed at the end of the design process to re- 
evaluate the search parameters. 

15 When TESS was tested with a mock query sequence containing known 

TF binding sites it was found that the program was unable to report matches to 
sites ending with the 3' end of the query sequence. Thus, an extra nucleotide was 
added to the 3' end of all query sequences to eliminate this problem. 

The first search for TF binding sites using the parameters described 

20 above found about 1 00 transcription factor binding sites (hits) for each of the two 
synthetic genes (GRver2 and RDver2). All sites were eliminated by changing 
one or more codons of the synthetic gene sequences in accordance with the 
codon optimization guidelines described in la above. However, it was expected 
that some these changes created new TF binding sites, other regulatory sites, and 

25 new restriction sites. Thus, steps 2 a-d were repeated as described, and 4 new 
restriction sites and 2 new splice sites were removed. The two output sequences 
from this third design step were named GRver3 and RDver3. Their DNA 
sequences are 66% identical (541 mismatches). 

30 4. Remove new transcription factor (TF) binding sites, then repeat steps 2 a-d 
The starting gene sequences for this design step were GRver3 and 
RDver3. 
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This fourth step is an iteration of the process described in step 3. The search for 
newly introduced TF binding sites yielded about 50 hits for each of the two 
synthetic genes. All sites were eliminated by changing one or more codons of 
the synthetic gene sequences in general accordance with the codon optimization 
5 guidelines described in 1 a above. However, more high to medium usage codons 
were used to allow elimination of all TF binding sites. The lowest priority was 
placed on maintaining low sequence identity between the GR and RD genes. 
Then steps 2 a-d were repeated as described. The two output sequences from 
this fourth design step were named GRver4 and RDver4. Their DNA sequences 
10 are 68% identical (506 mismatches). 

5. Remove new transcription factor (TF) binding sites, then repeat steps 2 a-d 

The starting gene sequences for this design step were GRver4 and 
RDver4. 

15 This fifth step is another iteration of the process described in step 3 above. The 
search for new TF binding sites introduced in step 4 yielded about 20 hits for 
each of the two synthetic genes. All sites were eliminated by changing one or 
more codons of the synthetic gene sequences in general accordance with the 
codon optimization guidelines described in la above. However, more high to 

20 medium usage codons were used (these are all considered '^preferred") to allow 
elimination of all TF binding sites. The lowest priority was placed on 
maintaining low sequence identity between the GR and RD genes. Then steps 2 
a-d were repeated as described. Only one acceptor splice site could not be 
eliminated. As a final step the absence of all TF binding sites in both genes as 

25 specified in step 3 was confirmed. The two output sequences from this fifth and 
last design step were named GRverS and RDver5. Their DNA sequences are 
69% identical (504 mismatches). 

Additional evaluation of GRverS and RDver5 
30 a) Use lower stringency parameters for TESS: 

The search for TF binding sites was repeated as described in step 3 above, but 
with even less stringent user-defined parameters: 

- setting LLH to 9 instead of 10 did not result in new hits; 
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- setting LLH to 0 through 8 (incl) resulted in hits for two additional 
sites, MAMAG (22 hits) and CTKTK (24 hits); 

- setting LLH to 8 and the minimum element length to 4, the search 
yielded (in addition to the two sites above) different 4-base sites for 

5 AP- 1 , NF- 1 , and c-Myb that are shortened versions of their longer 

respective consensus sites which were eliminated in steps 3-5 above. 

It was not realistic to attempt complete elimination of these sites without 

introduction of new sites, so no further changes were made. 

b) Search different database: 
1 0 The Eukaryotic Promoter Database (release 45) contains information about 

reliably mapped transcription start sites (1253 sequences) of eukaryotic genes. 

This database was searched using BLASTN 1.4.11 with default parameters 

(optimized to find nearly identical sequences rapidly; see Altschul et al, 1990) at 

the National Center for Biotechnology Information site 
15 (http://www.ncbi.nlm.n ih. pnv/cgi-bin/BLAST) . To test this approach, a portion 

of pGL3-Control vector sequence containing the SV40 promoter and enhancer 

was used as a query sequence, yielding the expected hits to SV40 sequences. No 

hits were found when using the two synthetic genes as query sequences. 

20 Summary of GRver5 and RDverS synthetic gene properties 

Both genes, which at this stage were still only "virtual" sequences in the 
computer, have a codon usage that strongly favors mammalian high-usage 
codons and minimizes mammalian and E. coli low-usage codons. 

Both genes are also completely devoid of eukaryotic TF binding sites 
25 consisting of more than four unambiguous bases, donor and acceptor splice sites 
(one exception: GRverS contains one splice acceptor site), poly(A) sites, specific 
prokaryotic {E. coli) regulatory sequences, and undesired restriction sites. 

The gene sequence identity between GRverS and RDverS is only 69% 
(504 base mismatches) while their encoded proteins are 99% identical (4 amino 
30 acid mismatches). Their identity with the parent sequence YG#8 1 -6G1 is 74% 
(GRverS) and 73% (RDverS). Their base composition is 49.9% GC (GRverS) 
and 49.5% GC (RDverS), compared to 40.2% GC for the parent YG#81-6G01. 
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Construction of synthetic genes 

The two synthetic genes were constructed by assembly from synthetic 
oligonucleotides in a thermocycler followed by PCR amplification of the full- 
length genes (similar to Stemmer et aL (1995) Gene . 164, pp. 49-53). 
5 Unintended mutations that interfered with the design goals of the synthetic genes 
were corrected. 

a) Design of synthetic oligonucleotides: 

The synthetic oligonucleotides were mostly 40mers that collectively code 
for both complete strands of each designed gene (1,626 bp) plus flanking regions 

10 needed for cloning (1,950 bp total for each gene). The 5 f and 3' boundaries of all 
oligonucleotides specifying one strand were generally placed in a manner to give 
an average offset/overlap of 20 bases relative to the boundaries of the 
oligonucleotides specifying the opposite strand. 

The ends of the flanking regions of both genes matched the ends of the 

1 5 amplification primers (pRAMtailup: 5'-gtactgagac gacgccagcccaagcttaggcctgagtg 
SEQ ID NO:54, and pRAMtaildn: 5'-ggcatgagcgt gaactgactg aactagcggccgccgag 
SEQ ID NO:55) to allow cloning of the genes into our E. coli expression vector 
pRAM (W099/14336). 

A total of 183 oligonucleotides were designed: fifteen oligonucleotides 

20 that collectively encode the upstream and downstream flanking sequences and 
168 oligonucleotides (4 x 42) that encode both strands of the two genes. 

All 183 oligonucleotides were run through the hairpin analysis of the 
OLIGO software (OLIGO 4.0 Primer Analysis Software © 1989-1991 by 
Wojciech Rychlik) to identify potentially detrimental intra-molecular loop 

25 formation. The guidelines for evaluating the analysis results were set according 
to recommendations of Dr. Sims (Sigma-Genosys Custom Gene Synthesis 
Department): oligos forming hairpins with AG < -10 have to be avoided, those 
forming hairpins with AG < -7 involving the 3* end of the oligonucleotide should 
also be avoided, while those with an overall AG < -5 should not pose a problem 

30 for this application. The analysis identified 23 oligonucleotides able to form 
hairpins with a AG between -7. 1 and -4.9. Of these, 5 had blocked or nearly 
blocked 3' ends (0-3 free bases) and were re-designed by removing 1-4 bases at 
their 3' end and adding it to the adjacent oligonucleotide. 
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The 40mer oligonucleotide covering the sequence complementary to the 
poly(A) tail had a very low complexity 3' end (13 consecutive T bases). An 
additional 40mer was designed with a high complexity 3' end but a consequently 
reduced overlap with one of its complementary oligonucleotides (11 instead of 

5 20 bases) on the opposite strand. 

Even though the oligonucleotides were designed for use in a 
thermocycler-based assembly reaction, they could also be used in a ligation- 
based protocol for gene construction. In this approach, the oligonucleotides are 
annealed in a pairwise fashion and the resulting short double-stranded fragments 

10 are ligated using the sticky overhangs. However, this would require that all 
oligonucleotides be phosphorylated. 
b) Gene assembly and amplification 

In a first step, each of the two synthetic genes was assembled in a 
separate reaction from 98 oligonucleotides. The total volume for each reaction 

15 was 50 fil: 

0.5 |iM oligonucleotides (= 0.25 pmoles of each oligo) 
1 .0 U Taq DNA polymerase 
0.02 U Pfu DNA polymerase 
2mMMgCl 2 
20 0.2 mM dNTPs (each) 

0.1% gelatin 

Cycling conditions: (94°C for 30 seconds, 52°C for 30 

seconds, and 72°C for 30 seconds) x 55 cycles. 
In a second step, each assembled synthetic gene was amplified in a 
25 separate reaction. The total volume for each reaction was 50 \ih 

2.5 1 assembly reaction 

5.0 U Taq DNA polymerase 

0. 1 U Pfu DNA polymerase 

1 M each primer (pRAMtailup, pRAMtaildn) 
30 2 mM MgCl 2 

0.2 mM dNTPs (each) 

Cycling conditions: (94°C for 20 seconds, 65°C for 60 

seconds, 72°C for 3 minutes) x 30 cycles. 
50 
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The assembled and amplified genes were subcloned into the pRAM 
vector and expressed in E. coli, yielding 1-2% luminescent GR or RD clones. 
Five GR and five RD clones were isolated and analyzed further. Of the five GR 
clones, three had the correct insert size, of which one was weakly luminescent 

5 and one had an altered restriction pattern. Of the five RD clones, two had the 
correct size insert with an altered restriction pattern and one of those was weakly 
luminescent. Overall, the analysis indicated the presence of a large number of 
mutations in the genes, most likely the result of errors introduced in the assembly 
and amplification reactions. 

10 c) Corrective assembly and amplification 

To remove the large number of mutations present in the full-length 
synthetic genes we performed an additional assembly and amplification reaction 
for each gene using the proof-reading DNA polymerase TIL The assembly 
reaction contained, in addition to the 98 GR or RD oligonucleotides, a small 

1 5 amount of DNA from the corresponding full-length clones with mutations 
described above. This allows the oligos to correct mutations present in the 
templates. 

The following assembly reaction was performed for each of the synthetic 
genes. The total volume for each reaction was 50 
20 0.5 [xM oligonucleotides (= 0.25 pmoles of each oligo) 

0.0 16 pmol plasmid (mix of clones with correct insert 
size) 

2.5 U Tli DNA polymerase 
2mMMgCl 2 
25 0.2 mM dNTPs (each) 

0.1 % gelatin 

Cycling conditions: 94°C for 30 seconds, then (94°C for 
30 seconds, 52°C for 30 seconds, 72°C for 30 seconds) for 
55 cycles, then 72°C for 5 minutes. 
30 The following amplification reaction was performed on each of the 

assembly reactions. The total volume for each amplification reaction was 50 
1-5 jil of assembly reaction 
40 pmol each primer (pRAMtailup, pRAMtaildn) 



51 



WO 2006/034061 



PCT/US2005/033218 



2.5 U Tli DNA polymerase 

2mMMgCl 2 

0.2 mM dNTPs (each) 

Cycling conditions: 94°C for 30 seconds, then (94°C for 
5 20 seconds, 65°C for 60 seconds and 72°C for 3 minutes) 

for 30 cycles, then 72°C for 5 minutes. 
The genes obtained from the corrective assembly and amplification step 
were subcloned into the pRAM vector and expressed in E. coli, yielding 75% 
luminescent GR or RD clones. Forty-four GR and 44 RD clones were analyzed 
10 with the screening robot described in W099/14336. The six best GR and RD 
clones were manually analyzed and one best GR and RD clone was selected 
(GR6 and RD7). Sequence analysis of GR6 revealed two point mutations in the 
coding region, both of which resulted in an amino acid substitution (S49N and 
P230S). Sequence analysis of RD7 revealed three point mutations in the coding 
15 region, one of which resulted in an amino acid substitution (H36Y). It was 
confirmed that none of the silent point mutations introduced any regulatory or 
restriction sites conflicting with the overall design criteria for the synthetic 
genes. 

d) Reversal of unintended amino acid substitutions 

20 The unintended amino acid substitutions present in the GR6 and RD7 

synthetic genes were reversed by site-directed mutagenesis to match the GRverS 
and RDverS designed sequences, thereby creating GRverS. 1 and RDverS. 1. The 
DNA sequences of the mutated regions were confirmed by sequence analysis. 

e) Improve spectral properties 

25 The RDverS. 1 gene was further modified to improve its spectral 

properties by introducing an amino change (R35 1G), thereby creating RDverS .2 

pGL3 vectors with RD and GR genes 

The parent click beetle luciferase YG#81-6G1 ("YG"), and the synthetic 
30 click beetle luciferase genes GRverS.l ("GR"), RDverS .2 ("RD"), and RD156- 
1H9 were cloned into the four pGL3 reporter vectors (Promega Corp.): 

- pGL3-Basic = no promoter, no enhancer 

- pGL3-Control = SV40 promoter, SV40 enhancer 
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- pGL3-Enhancer = SV40 enhancer (3* to luciferase coding sequences) 

- pGL3-Promoter = SV40 promoter. 

The primers employed in the assembly of GR and RD synthetic genes facilitated 
the cloning of those genes into pRAM vectors. To introduce the genes into 
5 pGL3 vectors (Promega Corp., Madison, WI) for analysis in mammalian cells, 
each gene in a pRAM vector (pRAM RDverS.l, pRAM GRverS.l, and pRAM 
RD156-1H9) was amplified to introduce an Nco I site at the 5' end and an Xba I 
site at the 3' end of the gene. The primers for pRAM RDverS. 1 and pRAM 
GRverS.l were: 

1 0 GR->5' GGA TCC CAT GGT GAA GCG TGA GAA 3' (SEQ ID NO:56) or 
RD->5' GGA TCC CAT GGT GAA ACG CGA 3' (SEQ ED NO:57) and 
5' CTA GCT TTT TTT TCT AGA TAA TCA TGA AGA C 3' (SEQ ID NO:58) 
The primers for pRAM RD156-1H9 were: 

5' GCG TAG CCA TGG TAA AGC GTG AGA AAA ATG TC 3' (SEQ ID NO: 
15 59) and 

5' CCG ACT CTA GAT TAC TAA CCG CCG GCC TTC ACC 3' (SEQ ID NO: 
60) 

The PCR included: 

lOOngDNAplasmid 
20 1 MM primer upstream 

1 |iM primer downstream 
0.2 mM dNTPs 
IX buffer (Promega Corp.) 
5 units Pfu DNA polymerase (Promega Corp.) 
25 Sterile nanopure H 2 0 to 50 \sl 

The cycling parameters were: 94°C for 5 minutes; (94°C for 30 seconds; 
55°C for 1 minute; and 72°C for 3 minutes) x 15 cycles. The purified PCR 
product was digested with Nco I and Xba I, ligated with pGL3-control that was 
also digested with Nco I and Xba I, and the ligated products introduced to E. colL 
30 To insert the luciferase genes into the other pGL3 reporter vectors (basic, 
promoter and enhancer), the pGL3-coritrol vectors containing each of the 
luciferase genes was digested with Nco I and Xba I, ligated with other pGL3 
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vectors that also were digested with Nco I and Xba I, and the ligated products 
introduced to E. colu Note that the polypeptide encoded by GRverS.l and 
RDverS.l (and RD156-1H9, see below) nucleic acid sequences in pGL3 vectors 
has an amino acid substitution at position 2 to valine as a result of the Nco I site 

5 at the initiation codon in the oligonucleotide. 

Because of internal Nco I and Xba I sites, the native gene in YG #81- 
6G01 was amplified from a Hind HI site upstream to a Hpa I site downstream of 
the coding region and which included flanking sequences found in the GR and 
RD clones. The upstream primer (5'-CAA AAA GCT TGG CAT TCC GGT 

1 0 ACT GTT GGT AAA GCC ACC ATG GTG AAG CGA GAG- 3'; SEQ ID 
NO:61) and a downstream primer (5'- CAA TTG TTG TTG TTA ACT TGT 
TTA TT -3'; SEQ ID NO:62) were mixed with YG#81-6G01 and amplified 
using the PCR conditions above. The purified PCR product was digested with 
Nco I and Xba I, ligated with pGL3-control that was also digested with Hind HI 

15 and Hpa I, and the ligated products introduced into E. colL To insert YG#81 - 
6G01 into the other pGL3 reporter vectors (basic, promoter and enhancer), tlxe 
pGL3-control vectors containing YG#81-6G01 were digested with Nco I and 
Xba I, ligated with the other pGL3 vectors that also were digested with Nco I and 
Xba I, and the ligated products introduced to E. coli. Note that the clone of 

20 YG#81-6G01 in the pGL3 vectors has a C instead of an A at base 786, which 
yields a change in the amino acid sequence at residue 262 from Phe to Leu. To 
determine whether the altered amino acid at position 262 affected the enzyme 
biochemistry, the clone of YG#81-6G01 was mutated to resemble the original 
sequence. Both clones were then tested for expression in E. coli, physical 

25 stability, substrate binding, and luminescence output kinetics. No significant 
differences were found. 

Partially purified enzymes expressed from the synthetic genes and the 
parent gene were employed to determine Km for luciferin and ATP (see Table 
3). 
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Table 3 



Enzyme 


K M (LH 2 ) 


Km (ATP) 


YG parent 


2mM 


17 mM 


GR 


1.3 mM 


25 nM 


RD 


24.5 nM 


46 nM 



In vitro eukaryotic transcription/translation reactions were also conducted 
using Promega's TNT T7 Quick system according to manufacturer's 
5 instructions. Luminescence levels were 1 to 37-fold and 1 to 77-fold higher 
(depending on the reaction time) for the synthetic GR and RD genes, 
respectively, compared to the parent gene (corrected for luminometer spectral 
sensitivity). 

To test whether the synthetic click beetle luciferase genes and the wild 

10 type click beetle gene have improved expression in mammalian cells, each of the 
synthetic genes and the parent gene was cloned into a series of pGL3 vectors and 
introduced into CHO cells (Table 8). In all cases, the synthetic click beetle genes 
exhibited a higher expression than the native gene. Specifically, expression of 
the synthetic GR and RD genes was 1900-fold and 40-fold higher, respectively, 

1 5 than that of the parent (transfection efficiency normalized by comparison to 

native Renilla luciferase gene). Moreover, the data (basic versus control vector) 
show that the synthetic genes have reduced basal level transcription. 

Further, in experiments with the enhancer vector where the percentage of 
activity in reference to the control is compared between the native and synthetic 

20 gene, the data showed that the synthetic genes have reduced risk of anomalous 
transcription characteristics. In particular, the parent gene appeared to contain 
one or more internal transcriptional regulatory sequences that are activated by the 
enhancer in the vector, and thus is not suitable as a reporter gene while the 
synthetic GR and RD genes showed a clean reporter response (transfection 

25 efficiency normalized by comparison to native Renilla luciferase gene). See 
Table 8. 
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Example 2 

Synthetic Renilla Luciferase Nucleic Acid Molecule 
The synthetic Renilla luciferase genes prepared include 1) an introduced 
Kozak sequence, 2) codon usage optimized for mammalian (human) expression, 
5 3) a reduction or elimination of unwanted restriction sites, 4) removal of 

prokaryotic regulatory sites (ribosome binding site and TATA box), 5) removal 
of splice sites and poly(A) sites, and 6) a reduction or elimination of mammalian 
transcriptional factor binding sequences. 

The process of computer-assisted design of synthetic Renilla luciferase 
1 0 genes by iterative rounds of codon optimization and removal of transcription 
factor binding sites and other regulatory sites as well as restriction sites can be 
described in three steps: 

1 . Using the wild type Renilla luciferase gene as the parent gene, codon usage 
was optimized, one amino acid was changed (T-^A) to generate a Kozak 

1 5 consensus sequence, and undesired restriction sites were eliminated thereby 
creating synthetic gene Rlucverl. 

2. Remove prokaryotic regulatory sites, splice sites, poly(A) sites and 
transcription factor (TF) binding sites (first pass). Then remove newly 
created TF binding sites. Then remove newly created undesired restriction 

20 enzyme sites, prokaryotic regulatory sites, splice sites, and poly(A) sites 
without introducing new TF binding sites. This thereby created Rlucver2. 

3 . Change 3 bases of Rlucver2 thereby creating Rluc-final. 

4. The actual gene was then constructed from synthetic oligonucleotides 
corresponding to the Rluc-final designed sequence. All mutations resulting 

25 from the assembly or PCR process were corrected. This gene is Rluc-final. 

Codon Selection 

Starting with the Renilla reniformis luciferase sequence in Genbank 
(Accession No. M63501), codons were selected based on codon usage for 
30 optimal expression in human cells and to avoid E. coli low-usage codons. The 
best codon for expression in human cells (or the best two codons if found at a 
similar frequency) was chosen for all amino acids with more than one codon 
(Wadaetal., 1990): 
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A . 

Arg: 


CGL 


Lys: 


A A /"* 

AAu 


x 

Leu: 


CTG 


Asn: 


A A 

AAL 


tier: 




LrUl. 


UAU 


Tnr: 


ALC 


trio. 


CAU 


Pro: 


CCA/CCT 


Glu: 


GAG 


Ala: 


GCC 


Asp: 


GAC 


Gly: 


GGC 


Tyr: 


TAC 


Val: 


GTG 


Cys: 


TGC 


He: 


ATC/ATT 


Phe: 


TTC 



1 0 In cases where two codons were selected for one amino acid, they were 

used in an alternating fashion. To meet other criteria for the synthetic gene, the 
initial optimal codon selection was modified to some extent later. For example, 
introduction of a Kozak sequence required the use of GCT for Ala at amino acid 
position 2 (see below). 

1 5 The following low-usage codons in mammalian cells were not used 

unless needed: Arg: CGA,CGU;Leu: CTA,UUA;Ser: TCG;Pro: CCG; 
Val: GTA; and He: ATA. The following low-usage codons in E. coli were also 
avoided when reasonable (note that 3 of these match the low-usage list for 
mammalian cells): Arg: CGA/CGG/AGA/AGG, Leu: CTA;Pro: CCC;fle: 

20 ATA. 

Introduction of Kozak Sequences 

The Kozak sequence: 5' a accATGG CT 3' (SEQ ID NO: 63) (the Nco I 
site is underlined, the coding region is shown in capital letters) was introduced to 
the synthetic Renilla luciferase gene. The introduction of the Kozak sequence 

25 changes the second amino acid from Thr to Ala (GCT). 
Removal of undesired restriction sites 

REBASE ver. 808 (updated August 1, 1998; Restriction Enzyme 
Database; www.neb.com/rebase) was employed to identify undesirable 
restriction sites as described in Example 1. The following undesired restriction 

30 sites (in addition to those described in Example 1) were removed according to 
the process described in Example 1 : EcolCR I, Ndel, Nsil, Sphl, Spel, Xmal, Pstl. 

The version of Renilla luciferase (Rluc) which incorporates all these 
changes is Rlucverl. 
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Removal of prokarvotic (E. coli) regulatory sequences, splice sites, and polvfA) 
sites 

The priority and process for eliminating transcription regulation sites was 
as described in Example 1. 
5 Removal of TF binding sites 

The same process, tools, and criteria were used as described in Example 
1 , however, the newer version 3.3 of the TRANSFAC database was employed. 

After removing prokaryotic regulatory sequences, splice sites and poly(A) 
sites from Rlucverl, the first search for TF binding sites identified about 60 hits. 
10 All sites were eliminated with the exception of three that could not be removed 
without altering the amino acid sequence of the synthetic Renilla gene: 

1 . site at position 63 composed of two codons for W 
(TGGTGG), for CAC-binding protein T00076; 

2. site at position 522 composed of codons for KMV 
15 ( AANATGGT m for myc-DFl T005 17; 

3. site at position 885 composed of codons for EMG 
(GARATGGGN), for myc-DFl T00517. 

The subsequent second search for (newly introduced) TF binding sites yielded 
about 20 hits. All new sites were eliminated, leaving only the three sites 
20 described above. Finally, any newly introduced restriction sites, prokaryotic 
regulatory sequences, splice sites and poly(A) sites were removed without 
introducing new TF binding sites if possible. 
Rlucver2 was obtained. 

As in Example 1, lower stringency search parameters were specified for 
25 the TESS filtered string search to further evaluate the synthetic Renilla gene. 

With the LLH reduced from 10 to 9 and the minimum element length 
reduced from 5 to 4, the TESS filtered string search did not show any new hits. 
When, in addition to the parameter changes listed above, the organism 
classification was expanded from "mammalia" to "chordata", the search yielded 
30 only four more TF binding sites. When the Min LLH was further reduced to 
between 8 and 0, the search showed two additional 5-base sites (MAMAG and 
CTKTK) which combined had four matches in Rlucver2, as well as several 4- 
base sites. Also as in Example 1, Rlucver2 was checked for hits to entries in the 
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EPD (Eukaryotic Promoter Database, Release 45). Three hits were determined 
one to Mus musculus promoter H-2L A d (Cell, 44, 261 (1986)), one to Herpes 
Simplex Virus type 1 promoter b'g'2.7 kb, and one to Homo sapiens DHFR 
promoter ( J. Mol. Biol. , 176, 169 (1984)). However, no further changes were 
5 made to Rlucver2. 



Summary of Properties for Rlucver2 

All 30 low usage codons were eliminated. The introduction of a Kozak 
sequence changed the second amino acid from Thr to Ala; 
10 - base composition: 55.7% GC (Renilla wild-type parent gene: 36.5%); 

one undesired restriction site could not be eliminated: EcoR V at position 
488; 

the synthetic gene had no prokaryotic promoter sequence but one 
potentially functional ribosome binding site (RBS) at positions 867-73 
1 5 (about 1 3 bases upstream of a Met codon ) could not be eliminated; 

all poly(A) sites were eliminated; 

splice sites: 2 donor splice sites could not be eliminated (both share the 
amino acid sequence MGK); 

TF sites: all sites with a consensus of >4 unambiguous bases were 
20 eliminated (about 280 TF binding sites were removed) with 3 exceptions 

due to the preference to avoid changes to the amino acid sequence. 
When introduced into pGL3, Rluc-final has a Kozak sequence 
(CACCATGGCT; SEQ ID NO:65). The changes in Rluc-final relative to 
Rlucver2 were introduced during gene assembly. One change was at position 
25 619, a C to an A, which eliminated a eukaryotic promoter sequence and reduced 
the stability of a hairpin structure in the corresponding oligonucleotide employed 
to assemble the gene. Other changes included a change from CGC to AGA at 
positions 218-220 (resulted in a better oligonucleotide for PGR). 



30 Gene Assembly Strategy 

The gene assembly protocol employed for the synthetic Renilla luciferase 
was similar to that described in Example 1 . 
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Sense Strand primer: 

5' AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA 3' (SEQ 
3DNO:66) 

Anti-sense Strand primer: 
5 5' GCTCTAGAATTACTGCTCGTTCTTCAGCACGCGCTCCACG 3' (SEQ 
IDNO:67) 

The resulting synthetic gene fragment was cloned into a pRAM vector 
■using Nco I and Xba I. Two clones having the correct size insert were 
sequenced. Four to six mutations were found in the synthetic gene from each 
10 clone. These mutations were fixed by site-directed mutagenesis (Gene Editor 
from Promega Corp., Madison, WI) and swapping the correct regions between 
these two genes. The corrected gene was confirmed by sequencing. 

Other Vectors 

15 To prepare an expression vector for the synthetic Renilla luciferase gene 

in a pGL-3 control vector backbone, 5 (ig of pGL3 -control was digested with 
Nco I and Xba I in 50 \A final volume with 2 \i\ of each enzyme and 5 \xl 10X 
buffer B (nanopure water was used to fill the volume to 50 |xl). The digestion 
reaction was incubated at 37°C for 2 hours, and the whole mixture was run on a 

20 1% agarose gel in 1XTAE. The desired vector backbone fragment was purified 
using Qiagen's QIAquick gel extraction kit. 

The native Renilla luciferase gene fragment was cloned into pGL3- 
control vector using two oligonucleotides, Nco I-RL-F and Xba I-RL-R, to PCR 
amplify native Renilla luciferase gene using pRL-CMV as the template. The 

25 sequence for Nco I-RL-F is 5'- 

CGCTAGCCATGGCTTCGAAAGTTTATGATCC -3' (SEQ ID NO:68); the 
sequence for Xba I-RL-R is 

5' GGCC AGTAACTCTAGAATT ATTGTT-3 ' (SEQ ID NO:69). The PCR 
reaction was carried out as follows: 
30 Reaction mixture (for 1 00 jal): 

DNA template (Plasmid) 1 .0 |il (1 .0 ng/|il final) 

10 X Rec. Buffer 10.0 jal (Stratagene Corp.) 



WO 2006/034061 



?CT/US2005/033218 



dNTPs (25 mM each) 



1.0 (xl (final 250 pM) 



Primer 1 (10 \M) 



2.0 \xl (0.2 pM final) 



Primer 2 (10 joM) 



2.0 nl (0.2 final) 



Pfu DNA Polymerase 



2.0 jlxI (2.5 U/^il, Stratagene Corp.) 



82.0 yl double distilled water 



10 



15 



20 



25 



30 



PCR Reaction: heat 94°C for 2 minutes; (94°C for 20 seconds; 65°C for 1 
minute; 72°C for 2 minutes; then 72°C for 5 minutes) x 25 cycles, then incubate 
on ice. The PCR amplified fragment was cut from a gel, and the DNA purified 
and stored at -20°C. 

To introduce native Renilla luciferase gene fragment into pGL3-control 
vector, 5 \xg of the PCR product of the native Renilla luciferase gene (RAM-RL- 
synthetic) was digested with Nco I and Xba I. The desired Renilla luciferase 
gene fragment was purified and stored at -20°C. 

Then 100 ng of insert and 100 ng of pGL3-control vector backbone were 
digested with restriction enzymes Nco I and Xba I and ligated together. Then 2 
|il of the ligation mixture was transformed into JM109 competent cells. Eight 
ampicillin resistance clones were picked and their DNA isolated. DNA from 
each positive clone of pGL3 -control-native and pGL3-control-synthetic was 
purified. The correct sequences for the native gene and the synthetic gene in the 
vectors were confirmed by DNA sequencing. 

To determine whether the synthetic Renilla luciferase gene has improved 
expression in mammalian cells, the gene was cloned into the mammalian 
expression vector pGL3-control vector under the control of SV40 promoter and 
SV40 early enhancer. The native Renilla luciferase gene was also cloned into 
the pGL-3 control vector so that the expression from synthetic gene and the 
native gene could be compared. The expression vectors were then transfected 
into four common mammalian cell lines (CHO, NIH3T3, Hela and CV-1; Table 
9), and the expression levels compared between the vectors with the synthetic 
gene versus the native gene. The amount of DNA used was at two different 
levels to ascertain that expression from the synthetic gene is consistently 
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increased at different expression levels. The results show a 70-600 fold increase 
of expression for the synthetic Renilla luciferase gene in these cells (Table 4). 

Table 4 

Cell Type Amount Vector Fold Expression Increase 

CHO 0.2 jig 142 

2.8 |ig 145 
NHDT3 0.2 \ig 326 

2.0 \ig 593 
HeLa 0.2 ng 185 

1.0 ^ig 103 
CV-1 0.2 ^ig 68 

2.0 ^g 72 

5 

One important advantage of luciferase reporter is its short protein half- 
life. The enhanced expression could also result from extended protein half-life 
and, if so, this gives an undesired disadvantage of the new gene. This possibility 
is ruled out by a cycloheximide chase ("CHX Chase") experiment, which 

10 demonstrated that there was no increase of protein half-life resulted from the 
humanized Renilla luciferase gene. 

To ensure that the increase in expression is not limited to one expression 
vector backbone, is promoter specific and/or cell specific, a synthetic Renilla 
gene (Rluc-final) as well as native Renilla gene were cloned into different vector 

1 5 backbones and under different promoters. The synthetic gene always exhibited 
increased expression compared to its wild-type counterpart (Table 5). 

Table 5 



Vector 


N1H-3T3 


HeLa 


CHO 


pRL-tk, native 


3,834.6 


922.4 


7,671.9 


pRL-tk, synthetic 


13,252.5 


9,040.2 


41,743.5 


pRL-CMV, native 


168,062.2 


842,482.5 


153,539.5 


pRL-CMV, synthetic 


2,168,129 


8,440,306 


2,532,576 


pRL-SV40, native 


224,224.4 


346,787.6 


85,323.6 
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Vector 


NK-3T3 


HeLa 


CHO 


pRL-SV40, synthetic 


1,469,588 


2,632,510 


1,422,830 


pRL-null, native 


2,853.8 


431.7 


2,434 


pRL-null, synthetic 


9,151.17 


2,439 


28,317.1 


pRGL3b, native 


12 


21.8 


17 


pRGL3b, synthetic 


130.5 


212.4 


1,094.5 


pivOi->J"llL, IiallVC 


27.9 


155.5 


186.4 


pKvjrjj3-tK, syntneuc 


6,778.2 


8,782.5 


9,685.9 


pi\jL-iK no lniron, native 


31.8 


165 


93.4 


nT?T -tV nn "in Iron ^vnthetic 


6,665.5 


6,379 


21,433.1 




Table 6 








Percent of control vector 


Vector 


CHO cells NIH3T3 cells 


HeLa cells 


pRL-control native 


100 


100 


100 


pRL-control synthetic 


100 


100 


100 


tVRT -ha^ir* nntivft 

UXN-Lj "UClOlv llali V w 


4.1 


5.6 


0.2 


pRL-basic synthetic 


0.4 


. 0.1 


0.0 


pRL-promoter native 


5.9 


7.8 


0.6 


pRL-promoter synthetic 


15.0 


9.9 


1.1 


pRL-enhancer native 


42.1 


123.9 


52.7 


pRL-enhancer synthetic 


2.6 


1.5 


5.4 



With reduced spurious expression the synthetic gene should exhibit less 
5 basal level transcription in a promoterless vector. Tke synthetic and native 
Renilla luciferase genes were cloned into the pGL3 -basic vector to compare the 
basal level of transcription. Because the synthetic gene itself has increased 
expression efficiency, the activity from the promoterLess vector cannot be 
compared directly to judge the difference in basal transcription, rather, this is 
1 0 taken into consideration by comparing the percentage of activity from the 
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promoterless vector in reference to the control vector (expression from the basic 
vector divided by the expression in the fully functional expression vector with 
both promoter and enhancer elements). The data demonstrate that the synthetic 
Renilla luciferase has a lower level of basal transcription than the native gene in 
5 mammalian cells (Table 6). 

It is well known to those skilled in the art that an enhancer can 
substantially stimulate promoter activity. To test whether the synthetic gene has 
reduced risk of inappropriate transcriptional characteristics, the native and 
synthetic gene were introduced into a vector with an enhancer element (pGL3- 
10 enhancer vector). Because the synthetic gene has higher expression efficiency, 
the activity of both cannot be compared directly to compare the level of 
transcription in the presence of the enhancer, however, this is taken into account 
by using the percentage of activity from enhancer vector in reference to the 
control vector (expression in the presence of enhancer divided by the expression 
15 in the fully functional expression vector with both promoter and enhancer 
elements). Such results show that when native gene is present, the enhancer 
alone is able to stimulate transcription from 42-124% of the control, however, 
when the native gene is replaced by the synthetic gene in the same vector, the 
activity only constitutes 1-5% of the value when the same enhancer and a strong 
20 SV40 promoter are employed. This clearly demonstrates that synthetic gene has 
reduced risk of spurious expression (Table 6). 

The synthetic Renilla gene (Rluc-final) was used in in vitro systems to 
compare translation efficiency with the native gene. In a T7 quick coupled 
transcription/translation system (Promega Corp., Madison, WT), pRL-xiull native 
25 plasmid (having the native Renilla luciferase gene under the control of the T7 
promoter) or the same amount of pRL-null-synthetic plasmid (having the 
synthetic Renilla luciferase gene under the control of the T7 promoter) was 
added to the TNT reaction mixture and luciferase activity measured every 
5 minutes up to 60 minutes. Dual Luciferase assay kit (Promega Corp.) was used 
30 to measure Renilla luciferase activity. The data showed that improved 
expression was obtained from the synthetic gene. To further evidence the 
increased translation efficiency of the synthetic gene, RNA was prepared by an in 
vitro transcription system, then purified. pRL-null (native or synthetic) vectors 
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were linearized with BamUI. The DNA was purified by multiple phenol- 
chloroform extraction followed by ethanol precipitation. An in vitro T7 
transcription system was employed by prepare RNAs. The DNA template was 
removed by using RNase-free DNase, and RNA was purified by phenol- 
5 chloroform extraction followed by multiple isopropanol precipitations. The 
same amount of purified RNA, either for the synthetic gene or the native gene, 
was then added to a rabbit reticulocyte lysate or wheat germ lysate. Again, the 
synthetic Renilla luciferase gene RNA produced more luciferase than the native 
one. These data suggest that the translation efficiency is improved by the 

1 0 synthetic sequence. To determine why the synthetic gene was highly expressed in 
wheat germ, plant codon usage was determined. The lowest usage codons in 
higher plants coincided with those in mammals. 

Reporter gene assays are widely used to study transcriptional regulation 
events. This is often carried out in co-transfection experiments, in which, along 

1 5 with the primary reporter construct containing the testing promoter, a second 
control reporter under a constitutive promoter is transfected into cells as an 
internal control to normalize experimental variations including transfection 
efficiencies between the samples. Control reporter signal, potential' promoter 
cross talk between the control reporter and primary reporter, as well as potential 

20 regulation of the control reporter by experimental conditions, are important 
aspects to consider for selecting a reliable co-reporter vector. 

As described above, vector constructs were made by cloning synthetic 
Renilla luciferase gene into different vector backbones under different 
promoters. All the constructs showed higher expression in the three mammalian 

25 cell lines tested (Table 5). Thus, with better expression efficiency, the synthetic 
Renilla luciferase gives out higher signal when transfected into mammalian cells. 

Because a higher signal is obtained, less promoter activity is required to 
achieve the same reporter signal, this reduced risk of promoter interference. 
CHO cells were transfected with 50 ng pGL3-control (firefly luc+) plus one of 5 

30 different amounts of native pRL-TK plasmid (50, 100, 500, 1000, or 2000 ng) or 
synthetic pRL-TK (5, 10, 50, 100, or 200 ng). To each transfection, pUC19 
carrier DNA was added to a total of 3 jig DNA. 10 fold less pRL-TK DNA gave 
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similar or more signal as the native gene, with reduced risk of inhibiting 
expression from the primary reporter pGL3-control. 

Experimental treatment sometimes may activate cryptic sites within the 
gene and cause induction or suppression of the co-reporter expression, which 
5 would compromise its function as co-reporter for normalization of transfection 
efficiencies. One example is that TPA induces expression of co-reporter vectors 
harboring the wild-type gene when transfecting MCF-7 cells. 500 ng pRL-TK 
(native), 5 jig native and synthetic pRG-B, 2.5 ^ig native and synthetic pRG-TK 
were transfected per well of MCF-7 cells. 100 ng/well pGL3-control (firefly 

10 luc+) was co-transfected with all RL plasmids. Carrier DNA, pUC19, was used 
to bring the total DNA transfected to 5.1 ng/well. 15.3 ^il TransFast Transfection 
Reagent (Promega Corp., Madison, WI) was added per well. Sixteen hours later, 
cells were trypsinized, pooled and split into six wells of a 6-well dish and 
allowed to attach to the well for 8 hours. Three wells were then treated with the 

15 0.2 nM of the tumor promoter, TPA (phorbol-12-myristate-13-acetate, 

Calbiochem #524400-S), and three wells were mock treated with 20 ^1 DMSO. 
Cells were harvested with 0.4 ml Passive Lysis Buffer 24 hours post TPA 
addition. The results showed that by using the synthetic gene, undesirable 
change of co-reporter expression by experimental stimuli can be avoided (Table 

20 7). This demonstrates that using synthetic gene can reduce the risk of anomalous 
expression. 

Table 7 



Vector 

pRL-tk untreated (native) 
pRL-tk TPA treated (native) 
pRG-B untreated (native) 
pRG-B TPA treated (native) 
pRG-B untreated (final) 
pRG-B TPA treated (final) 
pRG-tk untreated (native) 



Rlu Fold Induction 
184 

812 4.4 
1 

8 8.0 
132 

195 1.47 
44 
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Vector 

pRG-tk TP A treated (native) 
pRG-tk untreated (final) 
pRG-tk TP A treated (final) 



Rlu Fold Induction 

192 4.36 
12,816 

11,347 0.88 



Example 3 

Synthetic Firefly Luciferase Genes 
The luc+ gene (U.S. Patent No. 5,670,356) was optimized using two 
5 approaches. In the first approach (Strategy A), regulatory sequences such as 
codons were optimized and consensus transcription factor binding sites (TFBS) 
were removed (see Example 4, although different versions of programs and 
databases were used). The sequences obtained for the first approach include 
hluc+ver2AFl through hluc+ver2AF8 (designations with an "F' indicate the 

10 construct included flanking sequences). hluc+ver2AFl is codon-optimized, 

hluc+ver2AF2 is a sequence obtained after a first round of removal of identified 
undesired sequences including transcription factor binding sites, hluc+ver2 AF3 
was obtained after a second round of removal of identified undesired sequences 
including transcription factor binding sites, hluc+ver2AF4 was obtained after a 

1 5 third round of removal of identified undesired sequences including transcription 
factor binding sites, hluc+ver2AF5 was obtained after a fourth round of removal 
of identified undesired sequences including transcription factor binding sites, 
hluc+ver2AF6 was obtained after removal of promoter modules and RBS, 
hluc+ver2AF7 was obtained after further removal of identified undesired 

20 sequences including transcription factor binding sites, and hluc+ver2AF8 was 
obtained after modifying a restriction enzyme recognition site. 
Pairwise DNA identity of different P.pyralis luciferase gene versions: 



Table 8 





luc 


luc+ 


hluc+ 


hluc+ver2Al 


hluc+ver2Bl 


hluc+ver2A6 


hluc+ver2B6 


Luc 


100 


95 


76 


73 


77 


74 


75 


luc+ 




100 


78 


76 


78 


75 


77 


hluc+ 






100 


91 


81 


87 


81 


hluc+ver2Al 








100 


74 


91 


78 


hluc+ver2Bl 










100 


74 


85 


hluc+ver2A6 












100 


80 


hluc+ver2B6 














100 
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luc+ has the following sequence: 

atggaagacgccaaaaacataaagaaaggcccggcgccattctatccgctggaagatggaaccgctggagagca 
actgcataaggctatgaagagatacgccctggttcctggaacaattgcttttacagatgcacatatcgaggtggacatc 
5 acttacgctgagtacttcgaaatgtccgttcggttggcagaagctatgaaacgatatgggctg^ 

atcgtcgtatgcagtgaaaactctcttcaattctttatgccggtgttgggcgcgttamatcggagttgcagttgcgccc 
gcgaacgacatttataatgaacgtgaattgctcaacagtatgggcatttcgcagcctaccgtggtgttcgtttccaaaa 
aggggttgcaaaaaatmgaacgtgcaaaaaaagctcccaatcatccaaaaaattattatcatggattctaaa^ 
ttaccagggamcagtcgatgtacacgttcgtcacatctcatctacctcccggtmaatgaatacgattttgtgccaga 

10 gtccttcgatagggacaagacaattgcactgatcatgaactcctctggatctactggtctgcctaaaggtgtcgctctg 
cctcatagaactgcctgcgtgagattctcgcatgccagagatcctatttttggcaatcaaatcattccggatactgcgat 
maagtgttgttccattccatcacggtmggaatgmactacactcggatatttgatatgtggatttcgagtcgtcttaat 
gtatagatttgaagaagagctgtttctgaggagccttcaggattacaagattcaaagtgcgctgctggtgccaacccta 
ttctccttcttcgccaaaagcactctgattgacaaatacgatttatctaatttacacgaaattgcttctggtggcgctcccc 

15 tctctaaggaagtcggggaagcggttgccaagaggttccatc^ 

gactacatcagctattctgattacacccgagggggatgataaaccgggcgcggtcggtaaagttgttccattttttgaa 
gcgaaggttgtggatctggataccgggaaaacgctgggcgttaatcaaagaggcgaactgtgtgtgagaggtccta 
tgattatgtccggttatgtaaacaatccggaagcgaccaacgccttgattgacaaggatggatggctacattctggag 
acatagcttactgggacgaagacgaacacttcttcatcgttgaccgcctgaagtctctgattaagtacaaaggct 

20 ggtggctcccgctgaattggaatccatcttgctccaacaccccaacatcttcgacgcaggtgtcgcaggtcttcccga 
cgatgacgccggtgaacttcccgccgccgttgttgttttggagcacggaaagacgatgacggaaaaagagatcgtg 
gattacgtcgccagtcaagtaacaaccgcgaaaaagttgcgcggaggagttgtgtttgtggacgaagtaccgaaag 
gtcttaccggaaaactcgacgcaagaaaaatcagagagatcctcataaaggccaagaagggcggaaagatcgcc 
gtgtaa (SEQ ID NO:43) 

25 

and hluc+ has the following sequence: 

atggccgatgctaagaacattaagaagggccctgctcccttctaccctctggaggatggcaccgctggcgagcagc 
tgcacaaggccatgaagaggtatgccctggtgcctggcaccattgccttcaccgatgcccacattgaggtggacatc 
acctatgccgagtacttcgagatgtctgtgcgcctggccgaggccatgaagaggtacggcctgaacaccaaccacc 
30 gcatcgtggtgtgctctgagaactctctgcagttcttcatgccagtgctgggcgccctgttcatcggagtggccgtgg 
cccctgctaacgacatttacaacgagcgcgagctgctgaacagcatgggcatttctcagcctaccgtggtgttcgtgt 
ctaagaagggcctgcagaagatcctgaacgtgcagaagaagctgcctatcatccagaagatcatcatcatggactct 
aagaccgactaccagggcttccagagcatgtacacattcgtgacatctcatctgcctcctggcttcaacgagtacgac 
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ttcgtgccagagtcmcgacagggacaaaaccattgccctgatcatgaacagctctgggtrt^ 
ggcgtggccctgcctcatcgcaccgcctgtgtgcgcttctctcacgcccgcgaccctattttcggcaaccaga 
cccgacaccgctattctgagcgtggtgccattccaccacggcttcggcatgttcaccaccctgggctacctgatttgc 
ggcmcgggtggtgctgatgtaccgcttcgaggaggagct^^ 

5 gccctgctggtgccaaccctgttcagcttcttcgctaagagcaccctgatcgacaagtacgacctgtc^ 

gagattgcctctggcggcgccccactgtctaaggaggtgggcgaagccgtggccaagcgctttcatctgccaggca 
tccgccagggctacggcctgaccgagacaaccagcgccattctgattaccccagagggcgacgacaagcctggc 
gccgtgggcaaggtggtgccattcttcgaggccaaggtggtggacctggacaccggcaagaccctgggagtgaa 
ccagcgcggcgagctgtgtgtgcgcggccctatgattatgtccggctacgtgaataaccctgaggccacaaacgcc 

10 ctgatcgacaaggacggctggctgcactctggcgacattgcctactgggacgaggacgagcacttcttcatcgtgga 
ccgcctgaagtctctgatcaagtacaagggctaccaggtggccccagccgagctggagtctatcctgctgcagcac 
cctaacattttcgacgccggagtggccggcctgcccgacgacgatgccggcgagctgcctgccgccgtcgtcgtg 
ctggaacacggcaagaccatgaccgagaaggagatcgtggactatgtggccagccaggtgacaaccgccaagaa 
gctgcgcggcggagtggtgttcgtggacgaggtgcccaagggcctgaccggcaagctggacgcccgcaagatcc 

1 5 gcgagatcctgatcaaggctaagaaaggcggcaagatcgccgtgtaa (SEQ ID NO: 14). 

Table 9 
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hluc+ver2Al-hluc+ver2A5 have the following sequences (SEQ ID Nos.16-20): 
hluc+ver2Al 

5 AAAGCCACCATGGAGGACGCCAAGAACATCAAGAAGGGCCCCGCCC 
CCTTCTACCCCCTGGAGGACGGCACCGCCGGCGAGCAGCTGCACAAG 
GCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTCACCGA 
CGCCCACATCGAGGTGGACATCACCTACGCCGAGTACTTCGAGATGA 
GCGTGCGCCTGGCCGAGGCCATGAAGCGCTACGGCCTGAACACCAAC 

10 CACCGCATCGTGGTGTGCAGCGAGAACAGCCTGCAGTTCTTCATGCC 
CGTGCTGGGCGCCCTGTTCATCGGCGTGGCCGTGGCCCCCGCCAACG 
ACATCTACAACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAG 
CCCACCGTGGTGTTCGTGAGCAAGAAGGGCCTGCAGAAGATCCTGAA 
CGTGCAGAAGAAGCTGCCCATCATCCAGAAGATCATCATCATGGACA 

1 5 GCAAGACCGACTACCAGGGCTTCCAGAGCATGTACACCTTCGTGACC 
AGCCACCTGCCCCCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAG 
CTTCGACCGCGACAAGACCATCGCCCTGATCATGAACAGCAGCGGCA 
GCACCGGCCTGCCCAAGGGCGTGGCCCTGCCCCACCGCACCGCCTGC 
GTGCGCTTCAGCCACGCCCGCGACCCCATCTTCGGCAACCAGATCAT 

20 CCCCGACACCGCCATCCTGAGCGTGGTGCCCTTCCACCACGGCTTCG 
GCATGTTCACCACCCTGGGCTACCTGATCTGCGGCTTCCGCGTGGTGC 
TGATGTACCGCTTCGAGGAGGAGCTGTTCCTGCGCAGCCTGCAGGAC 
TACAAGATCCAGAGCGCCCTGCTGGTGCCCACCCTGTTCAGCTTCTTC 
GCCAAGAGCACCCTGATCGACAAGTACGACCTGAGCAACCTGCACGA 

25 GATCGCCAGCGGCGGCGCCCCCCTGAGCAAGGAGGTGGGCGAGGCC 
GTGGCCAAGCGCTTCCACCTGCCCGGCATCCGCCAGGGCTACGGCCT 
GACCGAGACCACCAGCGCCATCCTGATCACCCCCGAGGGCGACGACA 
AGCCCGGCGCCGTGGGCAAGGTGGTGCCCTTCTTCGAGGCCAAGGTG 
GTGGACCTGGACACCGGCAAGACCCTGGGCGTGAACCAGCGCGGCG 

30 AGCTGTGCGTGCGCGGCCCCATGATCATGAGCGGCTACGTGAACAAC 
CCCGAGGCCACCAACGCCCTGATCGACAAGGACGGCTGGCTGCACAG 
CGGCGACATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCGTGG 
ACCGCCTGAAGAGCCTGATCAAGTACAAGGGCTACCAGGTGGCCCCC 
GCCGAGCTGGAGAGCATCCTGCTGCAGCACCCCAACATCTTCGACGC 
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CGGCGTGGCCGGCCTGCCCGACGACGACGCCGGCGAGCTGCCCGCCG 
CCGTGGTGGTGCTGGAGCACGGCAAGACCATGACCGAGAAGGAGAT 
CGTGGACTACGTGGCCAGCCAGGTGACCACCGCCAAGAAGCTGCGCG 
GCGGCGTGGTGTTCGTGGACGAGGTGCCCAAGGGCCTGACCGGCAAG 
5 CTGGACGCCCGCAAGATCCGCGAGATCCTGATCAAGGCCAAGAAGG 
GCGGCAAGATCGCCGTGTAATAATTCTAGA 

hluc+ver2A2 

AAAGCCACCATGGAGGACGCCAAGAACATCAAGAAGGGCCCAGCGC 

10 CATTCTACCCCCTGGAGGACGGCACCGCCGGCGAGCAGCTGCACAAG 
GCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTCACCGA 
CGCACATATCGAGGTGGACATCACCTACGCCGAGTACTTCGAGATGA 
GCGTTCGGCTGGCAGAGGCTATGAAGCGCTATGGGCTGAACACCAAC 
CATCGCATCGTGGTGTGCAGCGAGAACAGCTTGCAGTTCTTCATGCC 

1 5 CGTGTTGGGTGCCCTGTTCATCGGCGTGGCTGTGGCCCC AGCTAACG 
ACATCTACAACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAG 
CCCACCGTCGTATTCGTGAGCAAGAAAGGGCTGCAAAAGATCCTGAA 
CGTGCAAAAGAAGCTGCCCATCATCCAAAAGATCATCATCATGGACA 
GCAAGA.CCGACTACCAGGGCTTCCAAAGCATGTACACCTTCGTGACC 

20 AGCCATTTGCCGCCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAG 
CTTCGACCGCGACAAGACCATCGCCCTGATCATGAACAGTAGTGGCA 
GTACCGGCTTACCTAAGGGCGTGGCCCTACCGCACCGCACCGCCTGT 
GTCCGATTCAGTCATGCCCGCGACCCCATCTTCGGCAACCAGATCATC 
CCCGACACCGCTATCCTGAGCGTGGTGCCATTTCACCACGGCTTCGGC 

25 ATGTTCACCACCCTGGGCTACTTGATCTGCGGCTTCCGGGTCGTGCTG 
ATGTACCGCTTCGAGGAGGAGCTATTCTTGCGCAGCTTGCAAGACTA 
CAAGATTCAAAGCGCCCTGCTGGTGCCCACCCTGTTCAGTTTCTTCGC 
CAAGAGCACCCTGATCGACAAGTACGACCTGAGCAACCTGCACGAG 
ATCGCCAGCGGCGGCGCCCCGCTCAGCAAGGAGGTGGGCGAGGCCG 

30 TGGCCAAGCGCTTCCACCTGCCAGGCATCCGCCAGGGCTACGGCCTG 
ACCGAGACAACCAGCGCCATTCTGATCACCCCCGAGGGGGACGACA 
AGCCTGGCGCAGTAGGCAAGGTGGTGCCCTTCTTCGAGGCTAAGGTG 
GTGGACCTGGACACCGGTAAAACCCTGGGTGTGAACCAGCGCGGCG 
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AGCTGTGCGTCCGTGGCCCCATGATCATGAGCGGCTACGTTAACAAC 
CCCGAGGCTACAAA.CGCCCTGATCGACAAGGACGGCTGGCTGCACAG 
CGGCGACATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCGTGG 
ACCGGCTGAAGAGCCTGATCAAATACAAGGGCTACCAGGTAGCCCCA 
5 GCCGAACTGGAGAGCATCCTGCTGCAGCACCCCAACATCTTCGACGC 
CGGGGTCGCCGGCCTGCCCGACGACGATGCCGGCGAGCTGCCCGCCG 
CAGTCGTGGTGCTGrGAGCACGGTAAAACCATGACCGAGAAGGAGAT 
CGTGGACTATGTGGfCCAGCCAGGTTACAACCGCCAAGAAGCTGCGCG 
GCGGCGTGGTGTTCGTGGACGAGGTGCCTAAAGGCCTGACGGGCAAG 
10 TTGGACGCCCGCAAvGATCCGCGAGATTCTGATCAAGGCCAAGAAGGG 
CGGCAAGATCGCCGTGTAATAATTCTAGA 

hluc+ver2A3 

AAAGCCACCATGGAAGATGCCAAAAACATTAAGAAGGGCCCAGCGC 

15 CATTCTACCCACTGGAGGACGGCACCGCCGGCGAGCAGCTGCACAAA 
GCCATGAAGCGCTA.CGCCCTGGTGCCCGGCACCATCGCCTTTACCGA 
CGCACATATCGAGGTGGACATCACCTACGCCGAGTACTTCGAGATGA 
GCGTTCGGCTGGCA.GAGGCTATGAAGCGCTATGGGCTGAATACCAAC 
CATCGCATCGTGGrGTGCAGCGAGAATAGCTTGCAGTTCTTCATGCCC 

20 GTGTTGGGTGCCCTGTTCATCGGTGTGGCTGTGGCCCCAGCTAACGAC 
ATCTACAACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAGCC 
CACCGTCGTATTCGTGAGCAAGAAAGGGCTGCAAAAGATCCTCAACG 
TGCAAAAGAAGCTACCGATCATACAAAAGATCATCATCATGGATAGC 
AAGACCGACTACCA.GGGCTTCCAAAGCATGTACACCTTCGTGACCAG 

25 CCATTTGCCACCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAGCTT 
CGACCGGGACAAAACCATCGCCCTGATCATGAACAGTAGTGGCAGTA 
CCGGATTGCCCAAGGGCGTAGCCCTACCGCACCGCACCGCCTGTGTC 
CGATTCAGTCATGCCCGCGACCCCATCTTCGGCAACCAGATCATCCCC 
GACACCGCTATCCTCAGCGTGGTGCCATTTCACCACGGCTTCGGCATG 

30 TTCACCACGCTGGGrCTACTTGATCTGCGGCTTTCGGGTCGTGCTCATG 
TACCGCTTCGAGGA.GGAGCTATTCTTGCGCAGCTTGCAAGACTATAA 
GATTCAAAGCGCCCTGCTGGTGCCCACACTGTTCAGCTTCTTCGCCAA 
GAGCACTCTCATCGACAAGTACGACCTGAGCAACCTGCACGAGATCG 
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CCAGCGGCGGGGCGCCGCTCAGCAA.GGAGGTGGGCGAGGCCGTGGC 
CAAGCGCTTCCACCTACCAGGCATCCGCCAGGGCTACGGCCTGACAG 
AAACAACCAGCGCCATTCTGATCACCCCCGAAGGGGACGACAAGCCT 
GGCGCAGTAGGCAAGGTGGTGCCCTTCTTCGAGGCTAAGGTGGTGGA 

5 CTTGGACACCGGTAAGACCCTGGGTGTGAACCAGCGCGGCGAGCTGT 
GCGTCCGTGGCCCCATGATCATGAGCGGCTACGTTAACAACCCCGAG 
GCTACAAACGCTCTCATCGACAAGGACGGCTGGCTGCACAGCGGCGA 
CATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCGTGGACCGGC 
TGAAGAGCCTGATCAAATACAAGGGrCTACCAGGTAGCCCCAGCCGA 

1 0 ACTGGAGAGCATCCTGCTGCAACACCCCAACATCTTCGACGCCGGGG 
TCGCCGGCCTGCCCGACGACGATGCCGGCGAGCTGCCCGCCGCAGTC 
GTCGTGCTGGAGCACGGTAAAACCA.TGACCGAGAAGGAGATCGTGG 
ACTATGTGGCCAGCCAGGTTACAACCGCCAAGAAGCTGCGCGGTGGT 
GTTGTGTTCGTGGACGAGGTGCCTAAAGGCCTGACGGGCAAGTTGGA 

1 5 CGCCCGCAAGATCCGCGAGATTCTC ATTAAGGCCAAGAAGGGCGGCA 
AGATCGCCGTGTAATAATTCTAGA 

hluc+ver2A4 

AAAGCCACCATGGAAGATGCCAAAAACATTAAGAAGGGCCCAGCGC 
20 CATTCTACCCACTCGAAGACGGCACCGCCGGCGAGCAGCTGCACAAA 
GCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTTACCGA 
CGCACATATCGAGGTGGACATTACCTACGCCGAGTACTTCGAGATGA 
GCGTTCGGCTGGCAGAAGCTATGAAvGCGCTATGGGCTGAACACCAAC 
CATCGCATCGTGGTGTGCAGCGAGA-ATAGCTTGCAGTTCTTCATGCCC 
25 GTGTTGGGTGCCCTGTTCATCGGTGTGGCTGTGGCCCCAGCTAACGAC 
ATCTACAACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAGCC 
CACCGTCGTATTCGTGAGCAAGAAA.GGGCTGCAAAAGATCCTCAACG 
TGCAAAAGAAGCTACCGATCATACAAAAGATCATCATCATGGATAGC 
AAGACCGACTACCAGGGCTTCCAAAGCATGTACACCTTCGTGACTTC 
30 CCATTTGCCACCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAGCTT 
CGACCGGGACAAAACCATCGCCCTGATCATGAACAGTAGTGGCAGTA 
CCGGATTGCCCAAGGGCGTAGCCCrACCGCACCGCACCGCTTGTGTC 
CGATTCAGTCATGCCCGCGACCCCA.TCTTCGGCAACCAGATCATCCCC 
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GACACCGCTATCCTCAGCGTGGTGCCATTTCACCACGGCTTCGGCATG 
TTCACCACGCTGGGCTACTTGATCTGCGGCITrCGGKjTCGTGCTCATG 
TACCGCTTCGAGGAGGAGCTATTCTTGCGCAGCTTC3CAAGACTATAA 
GATTCAAAGCGCCCTGCTGGTGCCCACACTGTTCAGTTTCTTCGCCAA 

5 GAGCACTCTCATCGACAAGTACGACCTAAGCAACTTGCACGAGATCG 
CCAGCGGCGGGGCGCCGCTCAGCAAGGAGGTGGGCGAGGCCGTGGC 
CAAACGCTTCCACCTACCAGGCATCCGCCAGGGCrACGGCCTGACAG 
AAACAACCAGCGCCATTCTGATCACCCCCGAAGGC3GACGACAAGCCT 
GGCGCAGTAGGCAAGGTGGTGCCCTTCTTCGAGGCTAAGGTGGTGGA 

10 CTTGGACACCGGTAAGACACTGGGTGTGAACCAGCGCGGCGAGCTGT 
GCGTCCGTGGCCCCATGATCATGAGCGGCTACGTTAACAACCCCGAG 
GCTACAAACGCTCTCATCGACAAGGACGGCTGGCTGCACAGCGGCGA 
CATCGCCTACTGGGACGAGGACGAGCACTTCTTCA.TCGTGGACCGGC 
TGAAGAGCCTGATCAAATACAAGGGCTACCAGGTAGCCCCAGCCGA 

1 5 ACTGGAGAGCATCCTGCTGCAACACCCCAACATCTTCGACGCCGGGG 
TCGCCGGCCTGCCCGACGACGATGCCGGCGAGCTGCCCGCCGCAGTC 
GTCGTGCTGGAACACGGTAAAACCATGACCGAGAAGGAGATCGTGG 
ACTATGTGGCCAGCCAGGTTACAACCGCCAAGAAGCTGCGCGGTGGT 
GTTGTGTTCGTGGACGAGGTGCCTAAAGGCCTGACGGGCAAGTTGGA 

20 CGCCCGCAAGATCCGCGAGATTCTCATTAAGGCCAAGAAGGGCGGCA 
AGATCGCCGTGTAATAATTCTAGA 

hluc+ver2A5 

AAAGCCACCATGGAAGATGCCAAAAACATTAAGAAGGGCCCAGCGC 
25 CATTCTACCCACTCGAAGACGGCACCGCCGGCGAGCAGCTGCACAAA 
GCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTTACCGA 
CGCACATATCGAGGTGGACATTACCTACGCCGAGTACTTCGAGATGA 
GCGTTCGGCTGGCAGAAGCTATGAAGCGCTATGGGCTGAACACCAAC 
CATCGGATCGTGGTGTGCAGCGAGAATAGCTTGCAGTTCTTCATGCC 
30 CGTGTTGGGTGCCCTGTTCATCGGTGTGGCTGTGGrCCCCAGCTAACGA 
CATCTACAACGAGCGCGAGCTGCTGAACAGCATG-GGCATCAGCCAGC 
CCACCGTCGTATTCGTGAGCAAGAAAGGGCTGCA-AAAGATCCTCAAC 
GTGCAAAAGAAGCTACCGATCATACAAAAGATCA.TCATCATGGATAG 
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CAAGACCGACTACCAGGGCTTCCAAAGCATGTACACCTTCGTGACTT 
CCCATTTGCCACCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAGC 
TTCGACCGGGACAAAACCATCGCCCTGATCATGAACAGTAGTGGCA.G 
TACCGGATTGCCCAAGGGCGTAGCCCTACCGCACCGCACCGCTTGTG 

5 TCCGATTCAGTCATGCCCGCGACCCCATCTTCGGCAACCAGATCATCC 
CCGACACCGCTATCCTCAGCGTGGTGCCATTTCACCACGGCTTCGGCA 
TGTTCACCACGCTGGGCTACTTGATCTGCGGCTTTCGGGTCGTGCTCA 
TGTACCGCTTCGAGGAGGAGCTATTCTTGCGCAGCTTGCAAGACTAT 
AAGATTCAAAGCGCCCTGCTGGTGCCCACACTGTTCAGTTTCTTCGCT 

10 AAGAGCACTCTCATCGACAAGTACGACCTAAGCAACTTGCACGAGAT 
CGCCAGCGGCGGGGCGCCGCTCAGCAAGGAGGTGGGCGAGGCCGTC 
GCCAAACGCTTCCACCTACCAGGCATCCGCCAGGGCTACGGCCTGA.C 
AGAAACAACCAGCGCCATTCTGATCACCCCCGAAGGGGACGACAAG 
CCTGGCGCAGTAGGCAAGGTGGTGCCCTTCTTCGAGGCTAAGGTGGrT 

15 GGACTTGGACACCGGTAAGACACTGGGTGTGAACCAGCGCGGCGAG 
CTGTGCGTCCGTGGCCCCATGATCATGAGCGGCTACGTTAACAACCC 
CGAGGCTACAAACGCTCTCATCGACAAGGACGGCTGGCTGCACAGCG 
GCGACATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCGTGGA.C 
CGGCTGAAGAGCCTGATCAAATACAAGGGCTACCAGGTAGCCCCAGC 

20 CGAACTGGAGAGCATCCTGCTGCAACACCCCAACATCTTCGACGCCG 
GGGTCGCCGGCCTGCCCGACGACGATGCCGGCGAGCTGCCCGCCGCA 
GTCGTCGTGCTGGAACACGGTAAAACCATGACCGAGAAGGAGATCGT 
GGACTATGTGGCCAGCCAGGTTACAACCGCCAAGAAGCTGCGCGGTG 
GTGTTGTGTTCGTGGACGAGGTGCCTAAAGGCCTGACGGGCAAGTTG 

25 GACGCCCGCAAGATCCGCGAGATTCTCATTAAGGCCAAGAAGGGCG 
GCAAGATCGCCGTGTAATAATTCTAGA 

hluc+ver2A6 has the following sequence 

AAAGCCACCATGGAaGAtGCCAAaAACATtAAGAAGGGCCCaGCgCCaT 
30 TCTACCCaCTcGAaGACGGCACCGCCGGCGAGCAGCTGCACAAaGCCA 
TGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTtACCGACGCaC 
AtATCGAGGTGGACATtACCTACGCCGAGTACTTCGAGATGAGCGTtCG 
gCTGGCaGAaGCtATGAAGCGCTAtGGgCTGAAtACaAACCAtCGgATCGT 
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GGTGTGCAGCGAGAAtAGCtTGCAGTTCTTCATGCCCGTGtTGGGtGCC 
CTGTTCATCGGtGTGGCtGTGGCCCCaGCtAACGACATCTACAACGAGC 
GCGAGCTGCTGAACAGCATGGGCATCAGCCAGCCCACCGTcGTaTTCG 
TGAGCAAGAAaGGgCTGCAaAAGATCCTcAACGTGCAaAAGAAGCTaCC 
5 gATCATaCAaAAGATCATCATCATGGAtAGCAAGACCGACTACCAGGG 
CTTCCAaAGCATGTACACCTTCGTGACttcCCAttTGCCaCCCGGCTTCAA 
CGAGTACGACTTCGTGCCCGAGAGCTTCGACCGgGACAAaACCATCGC 
CCTGATCATGAACAGtAGtGGCAGtACCGGatTgCCcAAGGGCGTaGCCC 
TaCCgCACCGCACCGCtTGtGTcCGaTTCAGtCAtGCCCGCGACCCCATCT 
10 TCGGCAACCAGATCATCCCCGACACCGCtATCCTcAGCGTGGTGCCaTT 
tCACCACGGCTTCGGCATGTTCACCACgCTGGGCTAQTGATCTGCGGC 
TTtCGgGTcGTGCTcATGTACCGCTTCGAGGAGGAGCTaTTCtTGCGCAG 
CtTGCAaGACTAtAAGATtCAaAGCGCCCTGCTGGTGCCCACaCTGTTCA 
GtTTCTTCGCtAAGAGCACtCTcATCGACAAGTACGACCTaAGCAAQTG 
1 5 CACGAGATCGCC AGCGGCGGgGCgCCgCTcAGCAAGGAGGTaGGtGAG 
GCCGTGGCCAAaCGCTTCCACCTaCCaGGCATCCGCCAGGGCTACGGC 
CTGACaGAaACaACCAGCGCCATtCTGATCACCCCCGAaGGgGACGACA 
AGCCtGGCGCaGTaGGCAAGGTGGTGCCCTTCTTCGAGGCtAAGGTGGT 
GGACtTGGACACCGGtAAgACaCTGGGtGTGAACCAGCGCGGCGAGCTG 
20 TGCGTcCGtGGCCCCATGATCATGAGCGGCTACGTtAACAACCCCGAG 
GCtACaAACGCtCTcATCGACAAGGACGGCTGGCTGCACAGCGGCGAC 
ATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCGTGGACCGgCT 
GAAGAGCCTGATCAAaTACAAGGGCTACCAGGTaGCCCCaGCCGAaCT 
GGAGAGCATCCTGCTGCAaCACCCCAACATCTTCGACGCCGGgGTcGC 
25 CGGCCTGCCCGACGACGAtGCCGGCGAGCTGCCCGCCGCaGTcGTcGT 
GCTGGAaCACGGtAAaACCATGACCGAGAAGGAGATCGTGGACTAtGT 
GGCCAGCCAGGTtACaACCGCCAAGAAGCTGCGCGGtGGtGTtGTGTTC 
GTGGACGAGGTGCCtAAaGGCCTGACgGGCAAGtTGGACGCCCGCAAG 
ATCCGCGAGATtCTcATtAAGGCCAAGAAGGGCGGCAAGATCGCCGTG 
30 TAATAATTCTAGA (SEQ ID NO:2 1). 

The hluc+ver2A6 sequence was modified yielding hluc+ver2A7: 
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AAAGCCACCATGGAaGAtGCCAAaAACATtAAGAA 
GGGCCCaGCgCCaTTCTACCCaCTcGAaGACGGgACCGCCGGCGAGCAG 
CTGCACAAaGCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCGC 
CTTtACCGACGCaCAtATCGAGGTGGACATtACCTACGCCGAGTACTTC 

5 GAGATGAGCGTtCGgCTGGCaGAaGCtATGAAGCGCTAtGGgCTGAAtAC 
aAACCAtCGgATCGTGGTGTGCAGCGAGAAtAGCtTGCAGTTCTTCATGC 
CCGTGtTGGGtGCCCTGTTCATCGGtGTGGCtGTGGCCCCaGCtAACGAC 
ATCTACAACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAGCC 
CACCGTcGTaTTCGTGAGCAAGAAaGGgCTGCAaAAGATCCTcAACGTG 

10 CAaAAGAAGCTaCCgATCATaCAaAAGATCATCATCATGGAtAGCAAGA 
CCGACTACCAGGGCTTCCAaAGCATGTACACCTTCGTGACttcCCAttTG 
CCaCCCGGCTTCAACGAGTACGACTTCGTGCCCGAGAGCTTCGACCGg 
GACAAaACCATCGCCCTGATCATGAACAGtAGtGGCAGtACCGGatTgCC 
cAAGGGCGTaGCCCTaCCgCACCGCACCGCtTGtGTcCGaTTCAGtCAtGCC 

15 CGCGACCCCATCTTCGGCAACCAGATCATCCCCGACACCGCtATCCTc 
AGCGTGGTGCCaTTtCACCACGGCTTCGGCATGTTCACCACgCTGGGCT 
ACtTGATCTGCGGCTTtCGgGTcGTGCTcATGTACCGCTTCGAGGAGGAG 
CTaTTCtTGCGCAGCtTGCAaGACTAtAAGATtCAatctGCCCTGCTGGTGC 
CCACaCTaTTtAGcTTCTTCGCtAAGAGCACtCTcATCGACAAGTACGACC 

20 TaAGCAACtTGCACGAGATCGCCAGCGGCGGgGCgCCgCTcAGCAAGGA 
GGTaGGtGAGGCCGTGGCCAAaCGCTTCCACCTaCCaGGCATCCGCCAG 
GGCTACGGCCTGACaGAaACaACCAGCGCCATtCTGATCACCCCCGAaG 
GgGACGACAAGCCtGGCGCaGTaGGCAAGGTGGTGCCCTTCTTCGAGG 
CtAAGGTGGTGGACtTGGACACCGGtAAgACaCTGGGtGTGAACCAGCG 

25 CGGCGAGCTGTGCGTcCGtGGCCCCATGATCATGAGCGGCTACGTtAA 
CAACCCCGAGGCtACaAACGCtCTcATCGACAAGGACGGCTGGCTGCA 
CAGCGGCGACATCGCCTACTGGGACGAGGACGAGCACTTCTTCATCG 
TGGACCGgCTGAAGAGCCTGATCAAaTACAAGGGCTACCAGGTaGCCC 
CaGCCGAaCTGGAGAGCATCCTGCTGCAaCACCCCAACATCTTCGACG 

30 CCGGgGTcGCCGGCCTGCCCGACGACGAtGCCGGCGAGCTGCCCGCCG 
CaGTcGTcGTGCTGGAaCACGGtAAaACCATGACCGAGAAGGAGATCGT 
GGACTAtGTGGCCAGCCAGGTtACaACCGCCAAGAAGCTGCGCGGtGGt 
GTtGTGTTCGTGGACGAGGTGCCtAAaGGCCTGACgGGCAAGtTGGACG 
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CCCGCAAGATCCGCGAGATtCTcATtAAGGCCAAGAAGGGCGGCAAGA 
TCGCCGTGTAATAATTCTAGA (SEQ ID NO:22). 

For vectors with a Bgli site in the multiple cloning region, the BgH site present in 
the firefly sequence can be removed. The luciferase gene from hluc+ver2AF8, 
which lacks a BgH site, displays an average of a 7.2-fold increase in expression 
when assayed in four mammalian cell lines, i.e., NIH3T3, CHO, HeLa and 
HEK293 cells. 

hluc+ver2A8 has the following sequence: 

AAAGCCACCATGGAaGAtGCCAAaAACATtAAGAAGGGCCCaGCgCCaT 

TCTACCCaCTcGAaGACGGgACCGCCGGCGAGCAGCTGCACAAaGCCA 

TGAAGCGCTACGCCCTGGTGCCCGGCACCATCGCCTTtACCGACGCaC 

AtATCGAGGTGGACATtACCTACGCCGAGTACTTCGAGATGAGCGTtCG 

g CTGGCaGAaGCtATGAAGCGCTAtGGgCTGAAtACaAACCAtCGgATCGT 

GGTGTGCAGCGAGAAtAGCtTGCAGTTCTTCATGCCCGTGtTGGGtGCC 

CTGTTCATCGGtGTGGCtGTGGCCCCaGCtAACGACATCTACAACGAGC 

GCGAGCTGCTGAACAGCATGGGCATCAGCCAGCCCACCGTcGTaTTCG 

TGAGCAAGAAaGGgCTGCAaAAGATCCTcAACGTGCAaAAGAAGCTaCC 

gATCATaCAaAAGATCATCATCATGGAtAGCAAGACCGACTACCAGGG 

CTTCCAaAGCATGTACACCTTCGTGACttcCCAttTGCCaCCCGGCTTCAA 

CGAGTACGACTTCGTGCCCGAGAGCTTCGACCGgGACAAaACCATCGC 

CCTGATCATGAACAGtAGtGGCAGtACCGGatTgCCcAAGGGCGTaGCCC 

TaCCgCACCGCACCGCtTGtGTcCGaTTCAGtCAtGCCCGCGACCCCATCT 

TCGGCAACCAGATCATCCCCGACACCGCtATCCTcAGCGTGGTGCCaTT 

tCACCACGGCTTCGGCATGTTCACCACgCTGGGCTACtTGATCTGCGGC 

TTtCGgGTcGTGCTcATGTACCGCTTCGAGGAGGAGCTaTTCtTGCGCAG 

CtTGCAaGACTAtAAGATtCAatctGCCCTGCTGGTGCCCACaCTaTTtAGcT 

TCTTCGCtAAGAGCACtCTcATCGACAAGTACGACCTaAGCAACtTGCAC 

GAGATCGCCAGCGGCGGgGCgCCgCTcAGCAAGGAGGTaGGtGAGGCC 

GTGGCCAAaCGCTTCCACCTaCCaGGCATCCGCCAGGGCTACGGCCTG 

ACaGAaACaACCAGCGCCATtCTGATCACCCCCGAaGGgGACGACAAGC 

QGGCGCaGTaGGCAAGGTGGTGCCCTTCTTCGAGGCtAAGGTGGTGGA 

CtTGGACACCGGtAAgACaCTGGGtGTGAACCAGCGCGGCGAGCTGTGC 
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GTcCGtGGCCCCATGATCATGAGCGGCTACGTtAACAACCCCGAGGCtA 
CaAACGCtCTcATCGACAAGGACGGCTGGCTGCACAGCGGCGACATCG 
CCTACTGGGACGAGGACGAGCACTTCTTCATCGTGGACCGgCTGAAG 
AGCCTGATCAAaTACAAGGGCTACCAGGTaGCCCCaGCCGAaCTGGAG 
5 AGCATCCTGCTGCAaCACCCCAACATCTTCGACGCCGGgGTcGCCGGC 
CTGCCCGACGACGAtGCCGGCGAGCTGCCCGCCGCaGTcGTcGTGCTGG 
AaCACGGtAAaACCATGACCGAGAAGGAGATCGTGGACTAtGTGGCCA 
GCCAGGTtACaACCGCCAAGAAGCTGCGCGGtGGtGTtGTGTTCGTGGA 
CGAGGTGCCtAAaGGaCTGACcGGCAAGtTGGACGCCCGCAAGATCCGC 
1 0 GAGATtCTcATtAAGGCC AAGAAGGGCGGCAAGATCGCCGTGTAATAA 
TTCTAGA (SEQ ID NO:23). 

For the second approach, firefly luciferase luc± codons were optimized for 
mammalian expression, and the number of consensus transcription factor binding 
site, and CG dinucleotides (CG islands, potential methylation sites) was reduced. 
The second approach yielded: versions hluc+ver2BFl through hluc+ver2BF5. 
hluc+ver2BFl is codon-optimized, hluc+ver2BF2 is a sequence obtained after a 
first round of removal of identified undesired sequences including transcription 
factor binding sites, hluc+ver2BF3 was obtained after a second round of removal 
of identified undesired sequences including transcription factor binding sites, 
hluc+ver2BF4 was obtained after a third round of removal of identified 
undesired sequences including transcription factor binding sites, hluc+ver2BF5 
was obtained after a fourth round of removal of identified undesired sequences 
including transcription factor binding sites, hluc+ver2BF6 was obtained after 
removal of promoter modules and RBS, hluc+ver2BF7 was obtained after further 
removal of identified undesired sequences including transcription factor binding 
sites, and hluc+ver2BF8 was obtained after modifying a restriction enzyme 
recognition site. 

30 hluc+ver2Bl-B5 have the following sequences (SEQ ID Nos. 24-28): 
hluc+ver2Bl 

AAAGCCACCATGGAGGATGCTAAGAATATTAAGAAGGGGCCTGCTCC 
TTTTTATCCTCTGGAGGATGGGACAGCTGGGGAGCAGCTGCATAA^ 
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CTATGAAGAGATATGCTCTGGTGCCTGGGACAATTGCTTTTACAGATG 
CTCATATTGAGGTGGATATTACATATGCTGAGTATTTTGAGATGTCTG 
TGAGACTGGCTGAGGCTATGAAGAGATATGGGCTGAATACAAATCAT 
AGAATTGTGGTGTGTTCTGAGAATTCTCTGCAGTTTTTTATGCCTGTG 

5 CTGGGGGCTCTGTTTATTGGGGTGGCTGTGGCTCCTGCTAATGATATT 
TATAATGAGAGAGAGCTGCTGAATTCTATGGGGATTTCTCAGCCTAC 
AGTGGTGTTTGTGTCTAAGAAGGGGCTGCAGAAGATTCTGAATGTGC 
AGAAGAAGCTGCCTATTATTCAGAAGATTATTATTATGGATTCTAAG 
ACAGATTATCAGGGGTTTCAGTCTATGTATACATTTGTGACATCTCAT 

1 0 CTGCCTCCTGGGTTTAATGAGTATGATTTTGTGCCTGAGTCTTTTGAT 
AGAGATAAGACAATTGCTCTGATTATGAATTCTTCTGGGTCTACAGG 
GCTGCCTAAGGGGGTGGCTCTGCCTCATAGAACAGCTTGTGTGAGAT 
TTTCTCATGCTAGAGATCCTATTTTTGGGAATCAGATTATTCCTGATA 
CAGCTATTCTGTCTGTGGTGCCTTTTCATCATGGGTTTGGGATGTTTAC 

1 5 AACACTGGGGTATCTGATTTGTGGGTTTAGAGTGGTGCTGATGTATAG 
ATTTGAGGAGGAGCTGTTTCTGAGATCTCTGCAGGATTATAAGATTCA 

CTGATTGATAAGTATGATCTGTCTAATCTGCATGAGATTGCTTCTGGG 
GGGGCTCCTCTGTCTAAGGAGGTGGGGGAGGCTGTGGCTAAGAGATT 

20 TCATCTGCCTGGGATTAGACAGGGGTATGGGCTGACAGAGACAACAT 
CTGCTATTCTGATTACACCTGAGGGGGATGATAAGCCTGGGGCTGTG 
GGGAAGGTGGTGCCTTTTTTTGAGGCTAAGGTGGTGGATCTGGATAC 
AGGGAAGACACTGGGGGTGAATCAGAGAGGGGAGCTGTGTGTGAGA 
GGGCCTATGATTATGTCTGGGTATGTGAATAATCCTGAGGCTACAAA 

25 TGCTCTGATTGATAAGGATGGGTGGCTGCATTCTGGGGATATTGCTTA 
TTGGGATGAGGATGAGCATTTTTTTATTGTGGATAGACTGAAGTCTCT 
GATTAAGTATAAGGGGTATCAGGTGGCTCCTGCTGAGCTGGAGTCTA 
TTCTGCTGCAGCATCCTAATATTTTTGATGCTGGGGTGGCTGGGCTGC 
CTGATGATGATGCTGGGGAGCTGCCTGCTGCTGTGGTGGTGCTGGAG 

30 CATGGGAAGACAATGACAGAGAAGGAGATTGTGGATTATGTGGCTTC 
TCAGGTGACAACAGCTAAGAAGCTGAGAGGGGGGGTGGTGTTTGTGG 
ATGAGGTGCCTAAGGGGCTGACAGGGAAGCTGGATGCTAGAAAGAT 
TAGAGAGATTCTGATTAAGGCTAAGAAGGGGGGGAAGATTGCTGTGT 
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AATAATTCTAGA 
hluc+ver2B2 

AAAGCCACCATGGAAGATGCTAAAAACATTAAGAAGGGGCCTGCTCC 
5 TTTCTACCCTCTGGAGGATGGGACTGCCGGGGAGCAGCTGCATAAAG 
CTATGAAGCGGTATGCTCTGGTGCCAGGCACAATTGCGTTCACGGAT 
GCTCACATTGAGGTGGACATTACATACGCTGAGTATTTTGAGATGTCG 
GTGCGGCTGGCTGAGGCTATGAAGCGATATGGGCTGAATACAAACCA 
TAGAATTGTAGTGTGCTCTGAGAACTCGTTGCAGTTTTTTATGCCTGT 

1 0 GCTGGGGGCTCTCTTC ATCGGGGTGGCTGTGGCTCCTGCTAACGACAT 
TTACAATGAGAGAGAGCTTTTGAACTCGATGGGGATTTCTCAGCCTA 
CAGTGGTGTTTGTGAGTAAGAAAGGGCTTCAAAAGATTCTCAATGTG 
CAAAAGAAGCTGCCTATTATTCAAAAGATTATTATTATGGACTCTAA 
GACAGACTACCAGGGGTTTCAGTCTATGTATACATTTGTGACATCTCA 

1 5 TCTGCCTCCTGGGTTC AACGAGTATGACTTTGTGCCCGAGTCTTTCGA 
CAGAGATAAGACAATTGCTCTGATTATGAATTCATCTGGGTCTACCG 
GGCTGCCTAAGGGTGTAGCTCTGCCACATAGAACAGCTTGTGTGAGA 
TTTTCTCATGCTAGGGACCCTATTTTTGGGAATCAGATTATTCCTGAT 
A.CTGCTATTCTGTCGGTTGTGCCCTTTCATCATGGGTTTGGGATGTTTA 

20 CAACACTGGGCTACCTGATATGTGGGTTTAGAGTGGTGCTCATGTATA 
GGTTTGAGGAGGAGCTTTTTTTGCGCTCTCTGCAAGATTATAAGATTC 
AGTCTGCTCTGCTGGTGCCTACACTGTTTTCTTTTTTTGCTAAGTCTAC 
CCTGATCGATAAGTATGATCTGTCCAACCTGCACGAGATTGCTTCTGG 
GGGGGCTCCTCTGTCTAAGGAGGTAGGTGAGGCTGTGGCTAAGCGCT 

25 TTCATCTGCCTGGAATCAGACAGGGGTATGGGCTAACAGAAACAACA 
TCTGCTATTCTGATTACACCAGAGGGGGATGATAAGCCCGGGGCTGT 
AGGGAAAGTGGTGCCCTTTTTTGAAGCTAAAGTAGTTGATCTTGATAC 
CGGTAAGACACTGGGGGTGAATCAGCGAGGGGAACTGTGTGTGAGA 
GGGCCTATGATTATGTCGGGGTATGTGAACAACCCTGAGGCTACAAA 

30 TGCTCTGATTGATAAGGATGGGTGGCTGCATTCGGGCGATATTGCTTA 
CTGGGATGAGGATGAGCATTTCTTCATCGTGGACAGACTGAAGTCGT 
TGATCAAATATAAGGGGTATCAAGTAGCTCCTGCTGAGCTGGAGTCC 
ATTCTGCTTCAACATCCTAACATTTTCGATGCTGGGGTGGCTGGGCTG 
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CCTGATGATGATGCTGGGGAGCTGCCTGCTGCTGTAGTGGTGCTGGA 
GCACGGTAAGACAATGACAGAGAAGGAGATTGTGGATTATGTGGCTT 
CACAAGTGACAACAGCTAAGAAACTGAGAGGTGGCGTTGTGTTTGTG 
GATGAGGTGCCTAAAGGGCTGACAGGCAAGCTGGATGCTAGAAAAA 
5 TTCGAGAGATTCTGATTAAGGCTAAGAAGGGTGGAAAGATTGCTGTG 
TAATAGTTCTAGA 

hluc+ver2B3 

AAAGCCACCATGGAAGATGCTAAAAACATTAAGAAGGGGCCTGCTCC 
1 0 TTTCTACCCTCTTGAAGATGGGACTGCTGGCGAGCAACTTCACAAAG 
CTATGAAGCGGTATGCTCTTGTGCCAGGCACAATTGCGTTCACGGAT 
GCTCACATTGAGGTGGACATCACATACGCTGAGTATTTTGAGATGTC 
GGTGCGGCTGGCAGAAGCTATGAAGCGCTATGGGCTGAATACAAACC 
ATAGAATTGTAGTGTGCAGTGAGAACTCGTTGCAGTTCTTTATGCCCG 
15 TGCTGGGGGCTCTCTTCATCGGGGTGGCTGTGGCTCCTGCTAACGACA 
TCTACAACGtAGCGAGAGCTGTTGAACTCGATGGGGATTTCTCAGCCT 

acagtggtgtttgtgagtaagaaagggcttcaaaagattctcaatgt 
gcaaaagaagctgcctattattcaaaagattattattatggactcta 
agaccgactaccaggggtttcagtctatgtatacatttgtgacatctc 

20 atctgcctccrggcttcaacgagtacgacttcgtgcccgagtctttcg 
acagagataagacaattgctctgatcatgaattcatccgggtctacc 
gggctgcctaagggtgtagctctgccccatagaacagcttgtgtgag 
attttctcatgctagggaccctatttttgggaatcagattattcctga 
cactgctattctgtcggtggtgccctttcatcatgggtttgggatgtt 

25 tacaacacrgggctacctaatatgtgggtttagagtggtgctcatgta 
taggtttgaagaagagctgttcttacgctctttgcaagattataagat 
tcagtctgctctgctggtgccaacactattctctttttttgctaagtct 

ACGCTCATAGACAAGTATGACTTGTCCAACTrGCACGAGATTGCrrCT 
GGCGGAGCACCTCTGTCTAAGGAGGTAGGTGAGGCTGTGGCTAAGCG 
30 CTTTCATCTGCCTGGTATCAGACAGGGGTATGGGCTAACAGAAACAA 
CATCTGCTATTCTGATTACACCAGAGGGGGATGATAAGCCCGGGGCT 
GTAGGGAAAGTGGTGCCCTTTTTTGAAGCCAAAGTAGTTGATCTTGAT 
ACCGGTAAGrACACTAGGGGTGAACCAGCGTGGTGAACTGTGTGTGAG 
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AGGGCCTATGATTATGTCGGGGTACGTTAACAACCCCGAAGCTACAA 
ATGCTCTGATTGATAAGGATGGCTGGCTGCATTCGGGCGACATTGCTT 
ACTGGGATGAGGATGAGCATTTCTTCATCGTGGACAGACTGAAGTCG 
TTGATCAAATACAAGGGGTATCAAGTAGCTCCTGCTGAGCTGGAATC 

5 CATTCTGCTTCAACATCCCAACATTTTCGATGCTGGGGTGGCTGGGCT 
GCCTGATGATGATGCTGGGGAGTTGCCTGCTGCTGTAGTGGTGCTTGA 
GCACGGTAAGACAATGACAGAGAAGGAGATCGTGGATTATGTGGCTT 
CACAAGTGACAACAGCTAAGAAACTGAGAGGTGGCGTTGTGTTTGTG 
GATGAGGTGCCTAAAGGGCTCACTGGCAAGCTGGATGCTAGAAAAAT 

1 0 TCGAGAGATTCTGATTAAGGCTAAGAAGGGTGGAAAGATTGCTGTGT 
AATAGTTCTAGA 

hluc+ver2B4 

AAAGCCACCATGGAAGATGCTAAAAACATTAAGAAGGGGCCTGCTCC 

1 5 CTTCTACCCTCTTGAAGATGGGACTGCTGGCGAGCAACTTCACAAAG 
CTATGAAGCGGTATGCTCTTGTGCCAGGCACAATTGCGTTCACGGAT 
GCTCACATTGAGGTGGACATCACATACGCTGAGTATTTTGAGATGTC 
GGTGCGGCTGGCAGAAGCTATGAAGCGCTATGGGCTGAATACAAACC 
ATAGAATTGTAGTGTGCAGTGAGAACTCGTTGCAGTTCTTTATGCCCG 

20 TGCTGGGGGCTCTCTTCATCGGGGTGGCTGTGGCTCCTGCTAACGACA 
TCTACAACGAGCGAGAGCTGTTGAACTCGATGGGGATCTCTCAGCCT 
ACAGTGGTGTTTGTGAGTAAGAAAGGGCTTCAAAAGATTCTCAATGT 
GCAAAAGAAGCTGCCTATTATTCAAAAGATTATTATTATGGACTCTA 
AGACAGACTACCAGGGGTTTCAGTCCATGTATACATTTGTGACATCTC 

25 ATCTGCCTCCTGGCTTCAA.CGAGTACGACTTCGTGCCCGAGTCTTTCG 
ACAGAGATAAGACAATTGCTCTGATCATGAATTCATCCGGGTCTACC 
GGGCTGCCTAAGGGTGTAGCTCTGCCCCATCGAACAGCTTGTGTGAG 
ATTCTCTCATGCCAGGGACCCGATCTTTGGGAATCAGATTATTCCTGA 
CACTGCTATTCTGTCGGTGGTGCCCTTTCATCATGGGTTTGGGATGTT 

30 TACAACACTGGGATACCT-AATATGTGGGTTTAGAGTGGTGCTCATGT 
ATAGGTTTGAAGAAGAACTGTTCTTACGCTCTTTGCAAGATTATAAGA 

TACGCTCATAGACAAGTATGACTTGTCCAACTTGCACGAGATTGCTTC 
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TGGCGGAGCACCTCTGTCTAAGGAGGTAGGTGAGGCTGTGGCTAAGC 
GCTTTCATCTGCCTGGTATCAGACAGGGGTACGGGCTAACAGAAACA 
ACTTCTGCTATTCTGATTACACCAGAGGGCGATGACAAGCCCGGGGC 
TGTAGGGAAAGTGGTGCCCTTTTTTGAAGCCAAAGTAGTTGATCTTGA 

5 TACCGGTAAGACACTAGGGGTGAACCAGCGTGGTGAACTGTGTGTGC 
GGGGCCCTATGATTATGTCGGGGTACGTTAACAACCCCGAAGCTACA 
AATGCTCTTATTGATAAGGATGGCTGGTTGCATTCGGGCGACATTGCC 
TACTGGGATGAGGATGAGCATTTCTTCATCGTGGACAGACTGAAGTC 
GTTGATCAAATACAAGGGGTATCAAGTAGCTCCTGCTGAGCTGGAAT 

10 CCATTCTGCTTCAACATCCAAACATTTTCGATGCTGGGGTGGCTGGGC 
TGCCTGATGATGATGCTGGAGAGTTGCCTGCTGCTGTAGTAGTGCTTG 
AGCACGGTAAGACAATGACAGAGAAGGAGATCGTGGATTATGTGGC 
TTCACAAGTGACAACAGCTAAGAAACTGAGAGGTGGCGTTGTGTTTG 
TGGATGAGGTGCCTAAAGGGCTCACTGGCAAGCTGGATGCCAGAAAA 

1 5 ATTCGAGAGATTCTCATTAAGGCTAAGAAGGGTGGAAAGATTGCTGT 
GTAATAGTTCTAGA 

hluc+ver2B5 

AAAGCCACCATGGAAGATGCTAAAAACATTAAGAAGGGGCCTGCTCC 
20 CTTCTACCCTCTTGAAGATGGGACTGCTGGCGAGCAACTTCACAAAG 
CTATGAAGCGGTATGCTCTTGTGCCAGGCACAATTGCGTTCACGGAT 
GCTCACATTGAGGTGGACATCACATACGCTGAGTATTTTGAGATGTC 
GGTGCGGCTGGCAGAAGCTATGAAGCGCTATGGGCTGAATACAAACC 
ATAGAATTGTAGTGTGCAGTGAGAACTCGTTGCAGTTCTTTATGCCCG 
25 TGCTGGGGGCTCTCTTCATCGGGGTGGCTGTGGCTCCTGCTAACGACA 
TCTACAACGAGCGAGAGCTGTTGAACTCGATGGGGATCTCTCAGCCT 
ACAGTGGTGTTTGTGAGTAAGAAAGGGCTTCAAAAGATTCTCAATGT 
GCAAAAGAAGCTGCCTATTATACAAAAGATTATTATTATGGACTCTA 
AGACCGACTACCAGGGGTTTCAGTCCATGTACACATTTGTAACCTCTC 
30 ATCTGCCTCCTGGCTTCAACGAGTACGACTTCGTGCCCGAGTCTTTCG 
ACAGGGACAAAACGATTGCTCTGATCATGrAACTCATCCGGGTCTACC 
GGGCTGCCTAAGGGTGTAGCTCTGCCCCATCGAACAGCTTGTGTGAG 
ATTCTCTCATGCCAGGGACCCGATCTTTGGGAATCAGATTATTCCTGA 
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CACTGCTATTCTGTCGGTGGTGCCCTTTCATCATGGGTTTGGGATGTT 
CACAACACTGGGATACCTCATTTGCGGGTTTAGAGTGGrGCTCATGTA 
TAGGTTTGAAGAAGAACTATTCCTACGCTCTTTGCAAGATTATAAGAT 
TCAGTCTGCTCTGCTGGTGCCAACACTATTCTCTTTTTTTGCTAAGTCT 

5 ACGCTCATAGACAAGTATGACTTGTCCAACTTGCACGAGATTGCTTCT 
GGCGGAGCACCTCTGTCTAAGGAGGTAGGTGAGGCTGTGGCTAAGCG 
CTTTCATCTGCCTGGTATCAGACAGGGGTACGGGCTAACAGAAACAA 
CTTCTGCTATTCTGATTACACCAGAGGGCGATGACAAACCCGGGGCT 
GTAGGGAAAGTGGTGCCCTTTTTTGAAGCCAAAGTAGTTGATCTTGAT 

1 0 ACCGGTAAGACACTAGGGGTGAACCAGCGTGGTGAACTGTGTGTGCG 
GGGCCCTATGATTATGTCGGGGTACGTTAACAACCCCGAAGCTACAA 
ATGCTCTTATTGATAAGGATGGCTGGTTGCATTCGGGCGACATTGCCT 
ACTGGGATGAGGATGAGCATTTCTTCATCGTGGACAGACTGAAGTCG 
TTGATCAAATACAAGGGGTATCAAGTAGCTCCTGCTGAGCTGGAATC 

15 CATTCTGCTTCAACATCCTAACATTTTCGATGCTGGGGTGGCTGGGCT 
GCCTGATGATGATGCTGGAGAGTTGCCTGCTGCTGTAGTAGTGCTTGA 
GCACGGTAAGACAATGACAGAGAAGGAGATCGTGGATTATGTGGCTT 
CACAAGTGACAACAGCTAAGAAACTGAGAGGTGGCGTrGTGTTTGTG 
GATGAGGTGCCTAAAGGGCTCACTGGCAAGCTGGATGCCAGAAAAAT 

20 TCGAGAGATTCTCATTAAGGCTAAGAAGGGTGGAAAGA.TTGCTGTGT 
AATAGTTCTAGA 

hluc+ver2B6 has the following sequence: 

AAAGCCACCATGGAaGATGCcAAaAAcATTAAGAAGGGGrCCTGCTCCc 
25 TTcTAcCCTCTtGAaGATGGGACtGCtGGcGAGCAaCTtCAcAAaGCTATGA 
AGcGgTATGCTCTtGTGCCaGGcACAATTGCgTTcACgGATGCTCAcATTG 
AaGTaGAcATcACATAcGCTGAGTATTTTGAGATGTCgGTGcGgCTGGCa 
GAaGCTATGAAGcGcTATGGGCTGAATACAAAcCATAGAATTGTaGTGT 
GcagTGAGAAcTCgtTGCAGTTcTTTATGCCcGTGCTGGGGGCTCTcTTcAT 
30 cGGGGTGGCTGTGGCTCCTGCTAAcGAcATcTAcAAcGAGcGAGAGCTgt 
TGAAcTCgATGGGGATcTCTCAGCCTACAGTGGTGTTTGTGagTAAGAA 
aGGGCTtCAaAAGATTCTcAATGTGCAaAAGAAGCTGCCTATTATaCAaA 
AGATTATTATTATGGAcTCtAAGACcGAcTAcCAGGGGTTTCAGTCcATG 
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TAcACATTTGTaACcTCTCATCTGCCTCCTGGcTTcAAcGAGTAcGAcTTc 
GTGCCcGAGTCTTTcGAcAGgGAcAAaACgATTGCTCTGATcATGAAcagc 
TCcGGGTCTACcGGGCTGCCTAAGGGtGTaGCTCTGCCcCATcGAACAGC 
TTGTGTGAGATTcTCTCATGCcAGgGAcCCgATcTTtGGaAAcCAGATcATc 

5 CCTGAcACtGCTATTCTGTCgGTgGTGCCcTTTCATCATGGGTTTGGGAT 
GTTcACAACACTGGGaTAccTcATtTGcGGGTTTAGAGTGGTGCTcATGTA 
TAGgTTTGAaGAaGAaCTaTTccTacGcTCTtTGCAaGATTATAAGATTCAG 
TCTGCTCTGCTGGTGCCaACACTamTCTTTTTTTGCTAAGTCTACgCTc 
ATaGAcAAGTATGActTGTCcAActTGCAcGAGATTGCTTCTGGcGGaGCa 

1 0 CCTCTGTCTAAGGAGGTaGGtGAGGCTGTGGCTAAGcGcTTTCATCTGC 
CTGGtATcAGACAGGGGTAcGGGCTaACAGAaACAACtTCTGCTATTCTG 
ATTACACCaGAGGGcGATGAcAAaCCcGGGGCTGTaGGGAAaGTGGTGC 
CcTTTTTTGAaGCcAAaGTaGTtGATCTtGATACcGGtAAGACACTaGGGGT 
GAAcCAGcGtGGtGAaCTGTGTGTGcGgGGcCCTATGATTATGTCgGGGTA 

1 5 cGTtAAcAAcCCcGAaGCTACAAATGCTCTcATaGAcAAGGAcGGgTGGcTt 
CATagcGGcGAcATTGCcTAcTGGGAcGAGGATGAGCATTTcTTcATcGTG 
GAcAGACTGAAGTCgtTGATcAAaTAcAAGGGGTATCAaGTaGCTCCTGC 
TGAGCTGGAaTCcATTCTGCTtCAaCAcCCcAAtATcTTcGATGCTGGGGT 
GGCTGGGCTGCCTGATGATGATGCTGGaGAGcTGCCTGCTGCTGTaGTa 

20 GTGCTtGAGCAcGGtAAGACAATGACAGAGAAGGAGATcGTGGATTAT 
GTGGCTTCaCAaGTGACAACAGCTAAGAAaCTGAGAGGtGGcGTtGTGT 
TTGTGGATGAGGTGCCTAAaGGGCTcACtGGcAAGCTGGATGCcAGAAA 
aATTcGAGAGATTCTcATTAAGGCTAAGAAGGGtGGaAAGATTGCTGTG 
TAATAgTTCTAGA (SEQ ID NO:29). 

25 

hluc+ver2BF8 was created by removing a Ptxl consensus transcription factor 
binding site from hluc+ver2BF7. 

hluc+ver2B7 has the following sequence: 
30 AAAGCCACCATGGAAGATGCCAAAAACATTAAGAAGGGGCCTGCTC 
CCTTCTACCCTCTTGAAGATGGGACTGCTGGCGAGCAACTTCACAAA 
GCTATGAAGCGGTATGCTCTTGTGCCAGGGACAATTGCGTTCACGGA 
TGCTCACATTGAAGTAGACATCACATACGCTGAGTATTTTGAGATGTC 
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GGTGCGGCTGGCAGAAGCTATGAAGCGCTATGGGCTGAATACAAACC 
ATAGAATTGTAGTGTGCAGTGAGAACTCGTTGCAGTTCTTTATGCCCG 
TGCTGGGGGCTCTCTTCATCGGGGTGGCTGTGGCTCCTGCTAACGACA 
TCTACAACGAGCGAGAGCTGTTGAACTCGATGGGGATCTCTCAGCCT 

5 ACAGTGGTGTTTGTGAGTAAGAAAGGGCTTCAAAAGATTCTCAATGT 
GCAAAAGAAGCTGCCTATTATACAAAAGATTATTATTATGGACTCTA 
AGACAGACTACCAGGGGTTTCAGTCCATGTACACATTTGTAACCTCTC 
ATCTGCCTCCTGGCTTCAACGAGTACGACTTCGTGCCCGAGTCTTTCG 
ACAGGGACAAAACGATTGCTCTGATCATGAACAGCTCCGGGTCTACC 

10 GGGCTGCCTAAGGGTGTAGCTCTGCCCCATCGAACAGCTTGTGTGAG 
ATTCTCTCATGCCAGGGACCCGATCTTTGGAAACCAGATCATCCCTGA 
CACTGCTATTCTGTCGGTGGTGCCCTTTCATCATGGGTTTGGGATGTT 
CACAACACTGGGATACCTCATTTGCGGGTTTAGAGTGGTGCTCATGTA 
TAGGTTTGAAGAAGAACTATTCCTACGCTCTTTGCAAGATTATAAGAT 

15 TCAGTCTGCTCTGCTGGTGCCAACACTATTCTCTTTTTTTGCTAAGTCT 
ACGCTCATAGACAAGTATGACTTGTCCAACTTGCACGAGATTGCTTCT 
GGCGGAGCACCTCTGTCTAAGGAGGTAGGTGAGGCTGTGGCTAAGCG 
CTTTCATCTGCCTGGTATCAGACAGGGGTACGGGCTAACAGAAACAA 
CTTCTGCTATTCTGATTACACCAGAGGGCGATGACAAACCCGGGGCT 

20 GTAGGGAAAGTGGTGCCCTTTTTTGAAGCCAAAGTAGTTGATCTTGAT 
ACCGGTAAGACACTAGGGGTGAACCAGCGTGGTGAACTGTGTGTGCG 
GGGCCCTATGATTATGTCGGGGTACGTTAACAACCCCGAAGCTACAA 
ATGCTCTCATAGACAAGGACGGGTGGCTTCATAGCGGCGACATTGCC 
TACTGGGACGAGGATGAGCATTTCTTCATCGTGGACAGACTGAAGTC 

25 GTTGATCAAATACAAGGGGTATCAAGTAGCTCCTGCCGAGCTTGAGT 
CCATTCTGCTTCAACACCCCAATATCTTCGATGCTGGGGTGGCTGGGC 
TGCCTGATGATGATGCTGGAGAGCTGCCTGCTGCTGTAGTAGTGCTTG 
AGCATGGTAAGACAATGACAGAGAAGGAGATCGTGGATTATGTGGCT 
TCACAAGTGACAACAGCTAAGAAACTCCGAGGTGGCGTTGTGTTTGT 

30 GGATGAGGTGCCTAAAGGGCTCACTGGCAAGCTGGATGCCAGAAAA 
ATTCGAGAGATTCTCATTAAGGCTAAGAAGGGTGGAAAGATTGCTGT 
GTAATAGTTCTAGA (SEQ ID NO:94) 
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hluc+ver2B8 has the following sequence 

AAAGCCACCATGGAaGATGCcAAaAAcATTAAGAAGGGGCCTGCTCCc 
TTcTAcCCTCTtGAaGATGGGACtGCtGGcGAGCAaCTtCAcAAaGCTATGA 
5 AGcGgTATGCTCTtGTGCCaGGgACAATTGCgTTcACgGATGCTCAcATTG 
AaGTaGAcATcACATAcGCTGAGTATTTTGAGATGTCgGTGcGgCTGGCa 
GAaGCTATGAAGcGcTATGGGCTGAATACAAAcCATAGAATTGTaGTGT 
GcagTGAGAAcTCgtTGCAGTTcTTTATGCCcGTGCTGGGGGCTCTcTTcAT 
cGGGGTGGCTGTGGCTCCTGCTAAcGAcATcTAcAAcGAGcGAGAGCTgt 

1 0 TGAAcTCgATGGGGATcTCTCAGCCTACAGTGGTGTTTGTGagTAAGAA 
aGGGCTtCAaAAGATTCTcAATGTGCAaAAGAAGCTaCCgATcATaCAaAA 
GATcATcATcATGGAtagcAAGACcGAcTAcCAGGGGTTTCAGTCcATGTA 
cACATTTGTaACcTCTCATCTGCCTCCTGGcTTcAAcGAGTAcGAcTTcGT 
GCCcGAGTCTTTcGAcAGgGAcAAaACgATTGCTCTGATcATGAAcagcTCc 

1 5 GGGTCTACcGGGCTGCCTAAGGGtGTaGCTCTGCCcCATcGAACAGCTT 
GTGTGAGATTcTCTCATGCcAGgGAcCCgATcTTtGGaAAcCAGATcATcC 
CTGAcACtGCTATTCTGTCgGTgGTGCCcTTTCATCATGGGTTTGGGATG 
TTcACAACACTGGGaTAccTcATtTGcGGGTTTAGAGTGGTGCTcATGTAT 
AGgTTTGAaGAaGAaCTaTTccTacGcTCTtTGCAaGATTATAAGATTCAGT 

20 CTGCTCTGCTGGTGCCaACACTaTTcTCTTTTTTTGCTAAGTCTACgCTcA 
TaGAcAAGTATGActTGTCcAActTGCAcGAGATTGCTTCTGGcGGaGCaCC 
TCTGTCTAAGGAGGTaGGtGAGGCTGTGGCTAAGcGcTTTCATCTGCCT 
GGtATcAGACAGGGGTAcGGGCTaACAGAaACAACtTCTGCTATTCTGAT 
TACACCaGAGGGcGATGAcAAaCCtGGGGCTGTaGGGAAaGTGGTGCCcT 

25 TTTTTGAaGCcAAaGTaGTtGATCTtGATACcGGtAAGACACTaGGGGTGA 
AcCAGcGtGGtGAaCTGTGTGTGcGgGGcCCTATGATTATGTCgGGGTAcG 
TtAAcAAcCCcGAaGCTACAAATGCTCTcATaGAcAAGGAcGGgTGGcTtC 
ATagcGGcGAcATTGCcTAcTGGGAcGAGGATGAGCATTTcTTcATcGTGG 
AcAGACTGAAGTCgtTGATcAAaTAcAAGGGGTATCAaGTaGCTCCTGCc 

30 GAGCTtGAgTCcATTCTGCTtCAaCAcCCcAAtATcTTcGATGCTGGGGTGG 

CTGGGCTGCCTGATGATGATGCTGGaGAGcTGCCTGCTGCTGTaGTaGT 

GCTtGAGCAtGGtAAGACAATGACAGAGAAGGAGATcGTGGATTATGT 

GGCTTCaCAaGTGACAACAGCTAAGAAaCTccGAGGtGGcGTtGTGTTTG 

TGGATGAGGTGCCTAAaGGGCTfACtGGcAAGCTGGATGCcAGAAAaAT 
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TcGAGAGATTCTcATTAAGGCTAAGAAGGGtGGaAAGATTGCTGTGTA 
ATAgTTCTAGA (SEQ ID NO:31). 

hluc+ver2BF8 was modified to yield hluc+ver2BF9. 

5 

hluc+ver2B9 has the following sequence 

AAAGCCACCATGGAaGATGCcAAaAAcATTAAGAAGGGGCCTGCTCCc 

TTcTAcCCTCTtGAaGATGGGACtGCtGGcGAGCAaCTtCAcAAaGCTATGA 

AGcGgTATGCTCTtGTGCCaGGgACAATTGCgTTcACgGATGCTCAcATTG 

10 AaGTaGAcATcACATAcGCTGAGTATTTTGAGATGTCgGTGcGgCTGGCa 
GAaGCTATGAAGcGcTATGGGCTGAATACAAAcCATAGAATTGTaGTGT 
GcagTGAGAAcTCgtTGCAGTTcTTTATGCCcGTGCTGGGGGCTCTcTTcAT 
tGGGGTGGCTGTGGCTCCTGCTAAtGAcATcTAcAAcGAGcGAGAGCTgtT 
GAAcagtATGGGGATcTCTCAGCCTACAGTGGTGTTTGTGagTAAGAAaG 

1 5 GGCTtCAaAAGATTCTcAATGTGCAaAAGAAGCTaCCgATcATaCAaAAG 
ATcATcATcATGGAtagcAAGACcGAcTAcCAGGGGTTTCAGTCcATGTAc 
ACATTTGTaACcTCTCATCTGCCTCCTGGcTTcAAtGAGTAtGAcTTcGTG 
CCcGAGTCTTTcGAcAGgGAcAAaACgATTGCTCTGATcATGAAcagcagtG 
GGTCTACcGGGCTGCCTAAGGGtGTaGCTCTGCCcCATcGAACAGCTTG 

20 TGTGAGATTcTCTCATGCcAGgGAcCCgATcTTtGGaAAcCAGATcATcCCT 
GAcACtGCTATTCTGTCgGTgGTGCCcTTTCATCATGGGTTTGGGATGTT 
cACAACACTGGGaTAccTcATtTGcGGGTTTAGAGTGGTGCTcATGTATA 
GgTTTGAaGAaGAaCTaTTccTacGcTCTtTGCAaGATTATAAGATTCAGTC 
TGCTCTGCTGGTGCCaACACTaTTcTCTTTTTTTGCTAAGTCTACgCTcAT 

25 aGAcAAGTATGActTGTCcAActTGCAcGAGATTGCTTCTGGcGGaGCaCCT 
CTGTCTAAGGAGGTaGGtGAGGCTGTGGCTAAGcGcTTTCATCTGCCTG 
GtATcAGACAGGGGTAcGGGCTaACAGAaACAACtTCTGCTATTCTGATT 
ACACCaGAGGGcGATGAcAAaCCtGGGGCTGTaGGGAAaGTGGTGCCcTT 
TTTTGAaGCcAAaGTaGTtGATCTtGATACcGGtAAGACACTaGGGGTGAA 

30 cCAGaGaGGtGAatTGTGTGTGaGgGGcCCTATGATTATGTCgGGGTAcGTt 
AAcAAcCCcGAaGCTACAAATGCTCTcATaGAcAAGGAcGGgTGGcTtCAT 
agtGGaGAtATTGCcTAcTGGGAtGAaGATGAGCATTTcTTcATcGTGGAcA 
GACTGAAGTCgtTGATcAAaTAcAAGGGGTATCAaGTaGCTCCTGCcGAG 
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CTtGAgTCcATTCTGCTtCAaCAcCCcAAtATcTTcGATGCTGGGGTGGCTG 
GGCTGCCTGATGATGATGCTGGaGAGcTGCCTGCTGCTGTaGTaGTGCTt 
GAGCAtGGtAAGACAATGACAGAGAAGGAGATcGTGGATTATGTGGCT 
TCaCAaGTGACAACAGCTAAGAAaCTccGAGGtGGcGTtGTGTTTGTGGA 
5 TGAGGTGCCTAAaGGGCTcACtGGcAAGCTGGATGCcAGAAAaATTcGA 
GAGATTCTcATTAAGGCTAAGAAGGGtGGaAAGATTGCTGTGTAATAgT 
TCTAGA (SEQ ID NO:32). 

The Bgtl sequence in hluc+ver2BF9 was removed resulting in hluc+ver2BF10. 
10 hluc+ver2BF10 demonstrated poor expression. 

hluc+ver2B10 has the following sequence 

AAAGCCACCATGGAaGATGCcAAaAAcATTAAGAAGGGGCCTGCTCCc 
TTcTAcCCTCTtGAaGATGGGACtGCtGGcGAGCAaCTtCAcAAaGCTATGA 

1 5 AGcGgTATGCTCTtGTGCCaGGgACAATTGCgTTcACgGATGCTCAcATTG 
AaGTaGAcATcACATAcGCTGAGTATTTTGAGATGTCgGTGcGgCTGGCa 
GAaGCTATGAAGcGcTATGGGCTGAATACAAAcCATAGAATTGTaGTGT 
GcagTGAGAAcTCg^TGCAGTTcTTTATGCCcGTGCTGGGGGCTCTcTTcAT 
tGGGGTGGCTGTGGCTCCTGCTAAtGAcATcTAcAAcGAGcGAGAGCTgtT 

20 GAAcagtATGGGGATcTCTCAGCCTACAGTGGTGTTTGTGagTAAGAAaG 
GGCTtCAaAAGATTCTcAATGTGCAaAAGAAGCTaCCgATcATaCAaAAG 
ATcATcATcATGGAtagcAAGACcGAcTAcCAGGGGTTTCAGTCcATGTAc 
ACATTTGTaACcTCTCATCTGCCTCCTGGcTTcAAtGAGTAtGAcTTcGTG 
CCcGAGTCTTTcGAcAGgGAcAAaACgATTGCTCTGATcATGAAcagcagtG 

25 GGTCTACcGGGCTGCCTAAGGGtGTaGCTCTGCCcCATcGAAGAGCTTG 
TGTGAGATTcTCTCATGCcAGgGAcCCgATcTTtGGaAAcCAGATcATcCCT 
GAcACtGCTATTCTGTCgGTgGTGCCcTTTCATCATGGGTTTGGGATGTT 
cACAACACTGGGaTAccTcATtTGcGGGTTTAGAGTGGTGCTcATGTATA 
GgTTTGAaGAaGAaCTaTTccTacGcTCTtTGCAaGATTATAAGATTCAGTC 

30 TGCTCTGCTGGTGCCaACACTaTTcTCTTTTTTTGCTAAGTCTACgCTcAT 
aGAcAAGTATGActTGTCcAActTGCAcGAGATTGCTTCTGGcGGaGCaCCT 
CTGTCTAAGGAGGTaGGtGAGGCTGTGGCTAAGcGcTTTCATCTGCCTG 
GtATcAGACAGGGGTAcGGGCTaACAGAaACAACtTCTGCTATTCTGATT 
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ji 

ACACCaGAGGGcGATGAcAAaCCtGGGGCTGTaGGGAAaGTGGTGCCcTT 
TTTTGAaGCcAAaGTaGTtGATCTtGATACcGGtAAGACACTaGGGGTGAA 
cCAGaGaGGtGAatTGTGTGTGaGgGGcCCTATGATTATGTCgGGGTAcGTt 
AAcAAcCCcGAaGCTACAAATGCTCTcATaGAcAAGGAcGGgTGGcTtCAT 
5 agtGGaGAtATTGCcTAcTGGGAtGAaGATGAGCATTTcTTcATcGTGGAcA 
GACTGAAGTCgtTGATcAAaTAcAAGGGGTATCAaGTaGCTCCTGCcGAG 
CTtGAgTCcATTCTGCTtCAaCAcCCcAAtATcTTcGATGCTGGGGTGGCTG 
GGCTGCCTGATGATGATGCTGGaGAGcTGCCTGCTGCTGTaGTaGTGCTt 
GAGCAtGGtAAGACAATGACAGAGAAGGAGATcGTGGATTATGTGGCT 
1 0 TCaC AaGTGAC AAC AGCTAAGAAaCTccGAGGtGGcGTtGTGTTTGTGGA 
TGAGGTGCCTAAaGGaCTcACtGGcAAGCTGGATGCcAGAAAaATTcGAG 
AGATTCTcATTAAGGCTAAGAAGGGtGGaAAGATTGCTGTGTAATAgTT 
CTAGA (SEQ ID NO:33). 

15 Table 11 

Summary of Firefly Luciferase Constructs 



Firefly luciferase 


Number of 


Number of 


CG dinucleotides 


Gene 


consensus 


Promoter 


(possible 




transcription 


modules* 


methylation sites) 




factor binding 








sites 






Luc+ 


287 


7 


97 


hluc+ver2AF8 


3 


0 


132 


hluc+ver2BF10 


3 


0 


43 



*Promoter modules are defined as a composite regulatory element, with 2 TFBS 



separated by a spacer, which has been shown to exhibit synergistic or 
20 antagonistic function. 

Example 4 

Synthetic Selectable Polypeptide Genes 

Design Process 
25 Define sequences 

Protein sequence that should be maintained: 

- Neo: from neo gene of pCI-neo (Promega) (SEQ ID NO: 1) 

- Hyg: from hyg gene of pcDNA3.1/Hygro (Livitrogen) (SEQ ID NO:6) 
DNA flanking regions for starting sequence: 
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- 5' end: Kozak sequence from neo gene of pCI-neo (GCCACCATGA; 
SEQ ID NO:34)), P/IML site (CCANNNNNTGG\ SEQ ID NO:35), add Ns 
at end (to avoid search algorithm errors & keep ORF1): 

neo/hyg: ^Jr^rr.AnnnnnT GCKlCACC-ATG^ (SEQ ID NO:36) 
5 Change: replace PfMI with Sbfi (CCTGCAGG) 

- 3' end: two stop codons (at least one TAA), PflML site (not compatible 
with that at 5' end to allow directional cloning), add Ns at end (to avoid 
search algorithm errors): 

neo/hyg: TAAT AACCAnnnmtTGGNNN (SEQ ID NO:37) 
1 0 Change: replace PfMI with AflSL (CTTAAG) 

Define codon usage 

Co don usage was obtained from the Codon Usage Database 
fhttp://www.kazusa.or.jp/codon/) : 
15 Based on: GenBank Release 131.0 [15 August 2002] (Nakamura et al., 

20O0). 

Codon usage tables were downloaded for: 

HS: Homo sapiens [gbpri] 50,031 CDS's (21,930,294 codons) 
MM: Mus musculus [gbrod] 23,113 CDS's (10,345,401 codons) 
20 EC: Escherichia coli [gbbct] 1 1,985 CDS's (3,688,954 codons) 

EC K12: Escherichia coli K12 [gbbct] 4,291 CDS's (1,363,716 codons) 
HS and MM were compared and found to be closely similar, use HS 
table 

<=$ EC and EC K12 were compared and found to be closely similar, use 

25 EC R12 table 

Codon selection strategy: 

Overall strategy is to adapt codon usage for optimal expression in 
mammalian cells while avoiding low-usage E. coli codons. One "best" 
codon was selected for each amino acid and used to back-translate the 

30 desired protein sequence to yield a starting gene sequence. 

Strategy A was chosen for the design of the neo and hyg genes (see Table 
12). (Strategy A: Codon bias optimized: emphasis on codons showing 

the highest usage frequency in HS. Best codons are those with highest 
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usage in HS , unless a codon with slightly lower usage has substantially 
higher usage in E. coli). 



Table 12 



Amino acid 


Codon Choices in 
Examples 1-2 


Codon Choices in Codon 
Bias Optimized Strategy 
A 


Gly 


GGC/GGT 


GGC 


Glu 


GAG 


GAG 


Asp 


GAC 


GAC 


Val 


GTG/GTC 


GTG 


Ala 


GCC/GCT 


GCC 


Arg 


CGC/CGT 


CGC 


Ser 


TCT/AGC 


AGC 


Lys 


AAG 


AAG 


Asn 


AAC 


A A /~1 

AAC 


lie 


AlC/Al 1 


ATP 


Thr 


ACC/ACT 


ACC 


Cys 


TGC 


TGC 


Tyr 


TAC 


TAC 


Leu 


CTG/TTG 


CTG 


Phe 


TTC 


TTC 


Gin 


CAG 


CAG 


His 


CAC 


CAC 


Pro 


CCA/CCT 


CCC 



Generate starting gene sequences 

Use custom codon usage table in Vector NTI 8.0 (Informax) ("Strategy A") 
Back-translate neo and hyg protein sequences 
Neo (based on neomycin gene from Promega's pCI-neo) 
10 MEQDGLHAGSPAAWVERLFGYDWAQQTIGCSDAAVFRLSAQGRPVLF 
VKTDI^GALNELQDEAARI^WLATTGVPCAAVLDVVTEAGRDWLLLGE 
WGQDLIiiSHLAPAEKVSEVLADAM 
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TRMEAGLVDQDDLDEEHQGIJ^AELFARLKARMPDGEDLW 
LPNIMVENGRFSGFroCGRLGVADRYQDIALATRDIAEELGGEW 
LYGIAAPDSQRIAFYRLLDEFF (SEQ ID NO:2) and encoded by 

Atgattgaacaagatggattgcacgcaggttctccggccgcttgggtggagaggctattcggctatgactgggcac 
5 aacagacaatcggctgctctgatgccgccgtgttccggctgtcagcgcaggggcgcccggttctttttgtc^ 
gacctgtccggtgccctgaatgaactgcaggacgaggcagcgcggctatcgtggctggccacgacgggcgttcct 
tgcgcagctgtgctcgacgttgtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcaggat 
ctcctgtcatctcaccttgctcctgccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttgatc 
cggctacctgcccattcgaccaccaagcgaaacatcgcatcgagcgagcacgtactcggatggaagccggtcttgt 
10 cgatcaggatgatctggacgaagagcatcaggggctcgcgccagccgaactgttcgccaggctcaaggcgcgcat 
gcccgacggcgaggatctcgtcgtgacccatggcgatgcctgcttgccgaatatcatggtggaaaatggccgctttt 
ctggattcatcgactgtggccggctgggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattgctg 
aagagcttggcggcgaatgggctgaccgcttcctcgtgctttacggtatcgccgctcccgattcgcagcgcatcgcc 
ttctatcgccttcttgacgagttcttctga (SEQ ID NO:l) 

15 

Hyg (based on hygromycin gene from Invitrogen's pcDNA3.1/Hygro) 
MKKPELTATS VEKFLIEKFD S VSDLMQLSEGEESRAFSFDVGGRGYVLRV 
NSCADGFYKDRYVYRHFASAALPIPEVLDIGEFSESLTYCISRRAQGVTL^ 
DLPETEIJ>AVLQPVAEAMDAIAAADLSQTSGFGPFGPQGIGQYTTWRDFI 
20 CAIADPHVYHWQTVMDDTVSASVAQ 

FGSNNVLTDNGRITAVIDWSEAMFGDSQYEVANIFFWRPWLA^ 
RYFERRHPELAGSPRLRAYMLRIGLDQLYQSLVDGNFDDAAWAQGR^ 
ATVTRSGAGTVGRTQIARRSA^VWTDGCVEVLADSG 
(SEQ ID NO:7) encoded by 

25 

Atgaaaaagcctgaactcaccgcgacgtctgtcgagaagtttctgatcgaaaagttcgacagcgtctccgacctgat 
gcagctctcggagggcgaagaatctcgtgctttcagcttcgatgtaggagggcgtggatatgtcctgcgggtaaata 
gctgcgccgatggtttctacaaagatcgttatgtttatcggcactttgcatcggccgcgctcccgattccggaagtgctt 
gacattggggaattcagcgagagcctgacctattgcatctcccgccgtgcacagggtgtcacgttgcaagacctgcc 
30 tgaaaccgaactgcccgctgttctgcagccggtcgcggaggccatggatgcgatcgctgcggccgatcttagccag 
acgagcgggttcggcccattcggaccgcaaggaatcggtcaatacactacatggcgtgatttcatatgcgcgattgc 
tgatccccatgtgtatcactggcaaactgtgatggacgacaccgtcagtgcgtccgtcgcgcaggctctcgatgagc 
tgatgctttgggccgaggactgccccgaagtccggcacctcgtgcacgcggatttcggctccaacaatgtcctgacg 
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gacaatggccgcataacagcggtcattgactggagcgaggcgatgttcggggattcccaatacgaggtcgccaac 
atcttcttctggaggccgtggttggctt^ 

aggatcgccgcggctccgggcgtatatgctccgcattggtcttgaccaactctatcagagcttggttgacggcaatttc 
gatgatgcagcttgggcgcagggtcgatgcgacgcaatcgtccgatccggagccgggactgtcgggcgtacacaa 
5 atcgcccgcagaagcgcggccgtctggaccgatggctgtgtagaagtactcgccgatagtggaaaccgacgcccc 
agcactcgtccgagggcaaaggaat (SEQ ID N0:6). 

Table 13 



Nomenclature of exemplary neo and hyg gene versions 



?@enbnamef : 




neo 


from pCI-neo (Promega) 


hneo 


humanized (codon usage strategy A) ORF 


hneo-F 


humanized ORF with 5' and 3' flanking regions 


hneo- IF 


humanized ORF with 5' and 3' flanking regions 
after first removal of undesired sequence matches 


hneo-2F 


humanized ORF with 5' and 3' flanking regions 
after second removal of unde sired sequence 
matches 


hneo-3F 


humanized ORF with 5' and 3' flanking regions 
after third removal of undesired sequence matches 


hneo-3FB 


Changed 5' and 3' flanking cloning sites 




from pcDNA3.1/Hygro (Invitrogen) 


hhyg 


humanized (codon usage strategy A) ORF 


hhyg-F 


humanized ORF with 5' and 3' flanking regions 


hhyg- IF 


humanized ORF with 5' and 3' flanking regions 
after first removal of undesired sequence matches 


hhyg-2F 


humanized ORF with 5' and 3' flanking regions 
after second removal of undesired sequence 
matches 


hhyg-3F 


humanized ORF with 5' and 3' flanking regions 
after third removal of undesired sequence matches 


hhyg-3FB 


Changed 5' and 3' flanking cloning sites 
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"h" indicates humanized codons, "F" indicates presence of 5' and 3' 
flanking sequences. 

Create starting (codon-optimized) gene sequences: 

5 hneo (humanized starting gene sequence without flanking regions in hneo-F) 
CCACTCAGTGGCCACCATGATCGAGCAGGACGGCCTGCACGCCGGCA 
GCCCCGCCGCCTGGGTGGAGCGCCTGTTCGGCTACGACTGGGCCCAG 
CAGACCATCGGCTGCAGCGACGCCGCCGTGTTCCGCCTGA.GCGCCCA 
GGGCCGCCCCGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGA 

1 0 ACGAGCTGCAGGACGAGGCCGCCCGCCTGAGCTGGCTGGCC ACC ACC 
GGCGTGCCCTGCGCCGCCGTGCTGGACGTGGTGACCGAGGCCGGCCG 
CGACTGGCTGCTGCTGGGCGAGGTGCCCGGCCAGGACCTGCTGAGCA 
GCCACCTGGCCCCCGCCGAGAAGGTGAGCATCATGGCCGACGCCATG 
CGCCGCCTGCACACCCTGGACCCCGCCACCTGCCCCTTCGACCACCA 

1 5 GGCCAAGCACCGCATCGAGCGCGCCCGCACCCGCATGGAGGCCGGC 
CTGGTGGACCAGGACGACCTGGACGAGGAGCACCAGGGCCTGGCCC 
CCGCCGAGCTGTTCGCCCGCCTGAAGGCCCGCATGCCCGACGGCGAG 
GACCTGGTGGTGACCCACGGCGACGCCTGCCTGCCCAACATCATGGT 
GGAGAACGGCCGCTTCAGCGGCTTCATCGACTGCGGCCGCCTGGGCG 

20 TGGCCGACCGCTACCAGGACATCGCCCTGGCCACCCGCGACATCGCC 
GAGGAGCTGGGCGGCGAGTGGGCCGACCGCTTCCTGGTGCTGTACGG 
CATCGCCGCCCCCGACAGCCAGCGCATCGCCTTCTACCGCCTGCTGG 
ACGAGTTCTTCTAATAACCAGTCTCTGG (SEQ ID NO:3). 

25 hhyg (humanized starting gene sequence without flanking regions) 

CCACTCAGTGGCCACCATGAAGAAGCCCGAGCTGACCGCCACCAGCG 
TGGAGAAGTTCCTGATCGAGAAGTTCGACAGCGTGAGCGACCTGATG 
CAGCTGAGCGAGGGCGAGGAGAGCCGCGCCTTCAGCTTCGACGTGG 
GCGGCCGCGGCTACGTGCTGCGCGTGAACAGCTGCGCCGACGGCTTC 

30 TACAAGGACCGCTACGTGTACCGCCACTTCGCCAGCGCCGCCCTGCC 
CATCCCCGAGGTGCTGGACATCGGCGAGTTCAGCGAGAGCCTGACCT 
ACTGCATCAGCCGCCGCGCCCAGGGCGTGACCCTGCAGG-ACCTGCCC 
GAGACCGAGCTGCCCGCCGTGCTGCAGCCCGTGGCCGAGGCCATGGA 
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CGCCATCGCCGCCGCCGACCTGAGCCAGACCAGCGGCTTCGGCCCCT 
TCGGCCCCCAGGGCATCGGCCAGTACACCACCTGGCGCGACTTCATC 
TGCGCCATCGCCGACCCCCACGTGTACCACTGGCAGACCGTGATGGA 
CGACACCGTGAGCGCCAGCGTGGCCCAGGCCCTGGACGAGCTGATGC 
5 TGTGGGCCGAGGACTGCCCCGAGGTGCGCCACCTGGTGCACGCCGAC 
TTCGGCAGCAACAACGTGCTGACCGACAACGGCCGCATCACCGCCGT 
GATCGACTGGAGCGAGGCCATGTTCGGCGACAGCCAGTACGAGGTGG 
CCAACATCTTCTTCTGGCGCCCCTGGCTGGCCTGCATGGAGCAGCAG 
ACCCGCTACTTCGAGCGCCGCCACCCCGAGCTGGCCGGCAGCCCCCG 

10 CCTGCGCGCCTACATGCTGCGCATCGGCCTGGACCAGCTGTACCAGA 
GCCTGGTGGACGGCAACTTCGACGACGCCGCCTGGGCCCAGGGCCGC 
TGCGACGCCATCGTGCGCAGCGGCGCCGGCACCGTGGGCCGCACCCA 
GATCGCCCGCCGCAGCGCCGCCGTGTGGACCGACGGCTGCGTGGAGG 
TGCTGGCCGACAGCGGCAACCGCCGCCCCAGCACCCGCCCCCGCGCC 

15 AAGGAGTAATAACCAGCTCTTGG (SEQ ED NO:8). 

Programs and databases used for identification and removal of sequence motifs 
All from Genomatix Software GmbH (Munich, Germany, 
http ://www. genomatix.de) : 
GEMS Launcher Release 3.5.2 (June 2003) 
20 Matlnspector professional Release 6.2. 1 June 2003 

Matrix Family Library Ver 3.1.2 June 2003 (incl. 318 vertebrate matrices 

in 128 families) 

Modellnspector professional Release 4.8 October 2002 
Model Library Ver 3.1 March 2003 (226 modules) 
25 SequenceShaper tool 

User Defined Matrices 

Sequence motifs to remove from starting gene sequences 
(In order of priority) 
30 Restriction enzyme recognition sequences : 

See user-defined matrix subset neo and hyg. Same as those used for 

design of hluc+ version 2.0 

Generally includes those required for cloning (pGL4) or commonly used 
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10 



15 



20 



25 



for cloning 

Change: also SbjU Afll, AccUl 
Transcription factor binding sequences: 

Promoter modules (2 TF binding sites with defined orientation) with 
default score or greater 

Vertebrate TF binding sequences with score of at least core=0.75 / 
matrix=optimized 

Eukarvotic transcription regulatory sites: 
Kozak sequence 

Splice donor / acceptor sequences in (+) strand 
PolyA addition sequences in (+) strand 
Prokarvotic transcription regulatory sequences: 
E. coli promoters 

E. coli RBS (if less than 20 bp upstream of Met codon) 
User-defined matrix subset "neo+hyg" 

Format: Matrix name (core similarity threshold / matrix similarity threshold) 



U$AatE (0.75/1.00) 
USBamHI (0.75/1.00) 

USBgU (0.75/1.00) 

U$BglH (0.75/1.00) 

USBsal (0.75/1.00) 

USBsmAI (0.75/1.00) 

USBsmBI (0.75/1.00) 

USBstEH (0.75/1 .00) 

USBstXI (0.75/1.00) 

U$Csp45I (0.75/1.00) 

USCspI (0.75/1.00) 
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• U$EC-P-10(1.00/Optimized) 

• USEC-P-35 (1.00/Optimized) 

• U$EC-Prom(1.00/Optimized) 
. USEC-RBS (0.75/1.00) 

5 • USEcoRI (0.75/1.00) 

. USHindlH (0.75/1.00) 

• USKozak (0.75/Optimized) 
. USKpnl (0.75/1.00) 

. USMluI (0.75/1.00) 

10 • U$NcoI (0.75/1.00) 

. U$NdeI (0.75/1 .00) 

• USNhel (0.75/1.00) 
. USNotI (0.75/1.00) 
. U$NsiI (0.75/1.00) 

15 • USPflMI (0.75/1.00) 

. USPmel (0.75/1.00) 

. USPolyAsig (0.75/1.00) 

. USPstI (0.75/1.00) 

. USSacI (0.75/1.00) 

20 • USSacH (0.75/1.00) 

. USSaU (0.75/1.00) 

. USSfil (0.75/1.00) 

• U$Sg£I (0.75/1 .00) 
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10 



15 



20 



U$SmaI (0.75/1 .00) 
USSnaBI (0.75/1.00) 
USSpel (0.75/1.00) 
U$Splice-A (0.75/Optimized) 
U$Splice-D (0.75/Optimized) 
USXbal (0.75/1.00) 

U$XcmI (0.75/1.00) 

U$XhoI (0.75/1.00) 

ALL vertebrates.lib (0.75/Optimized) 



User-defined matrix subset "»eo+/2yg-EC" 

Format: Matrix name (core similarity threshold / matrix similarity threshold) 
USAatH (0.75/1.00) 
USBamHI (0.75/1.00) 

U$BglI (0.75/1.00) 

USBgffl (0.75/1 .00) 

USBsal (0.75/1.00) 

U$BsmAI (0.75/1.00) 

USBsmBI (0.75/L00) 

U$BstEH (0.75/1 .00) 

USBstXI (0.75/1.00) 

U$Csp45I (0.75/1.00) 

U$CspI (0.75/1.00) 

USEcoRI (0.75/1.00) 
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. USHindm (0.75/1.00) 

. USKozak (0.75/Optimized) 

. USKpnl (0.75/1.00) 

. U$MluI (0.75/1.00) 

5 • USNcol (0.75/1.00) 

. U$NdeI (0.75/1.00) 

. USNhel (0.75/1.00) 

. U$NotI (0.75/1.00) 

. USNsil (0.75/1.00) 

10 • USPflMI (0.75/1 .00) 

. USPmel (0.75/1.00) 

. USPolyAsig (0.75/1.00) 

. USPstI (0.75/1.00) 

. USSacI (0.75/1.00) 

15 • USSacH (0.75/1.00) 

. USSall (0.75/1.00) 

. USSfil (0.75/1.00) 

. USSgfl (0.75/1.00) 

• U$SmaI (0.75/1.00) 
20 • USSnaBI (0.75/1.00) 

. U$SpeI (0.75/1 .00) 

• U$Splice-A (0.75/Optimized) 

• U$Splice-D (0.75/Optimized) 
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• USXbal (0.75/1.00) 

• USXcml (0.75/1.00) 

. USXhoI (0.75/1.00) 

• ALL vertebrates.lib (0.75/Optimized) 

5 

User-defined matrix subset "pGL4-072503" 

Format: Matrix name (core similarity threshold / matrix similarity threshold) 

. U$AatH (0.75/1.00) 

. USAccDI (0.75/1 .00) 

10 • USAfin (0.75/1.00) 

. USBamHI (0.75/1 .00) 

. U$BgU (0.75/1.00) 

. USBglH (0.75/1 .00) 

. USBsal (0.75/1.00) 

15 • USBsmAI (0.75/1.00) 

. USBsmBI (0.75/1.00) 

• USBstEH (0.75/1.00) 
. USBstXI (0.75/1.00) 

• U$Csp45I (0.75/1.00) 
20 • USCspI (0.75/1.00) 

. USEC-P-10 (1.00/Optimized) 

• U$EC-P-35 (1.00/Optimized) 

• U$EC-Prom (1.00/Optimized) 
. USEC-RBS (0.75/1.00) 
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• USEcoRI (0.75/1.00) 

• USHindffl (0.75/1 .00) 

• U$Kozak (0.75/Optimized) 

• U$KpnI (0.75/1 .00) 
5 - USMluI (0.75/1 .00) 

- U$NcoI (0.75/1.00) 

- U$NdeI (0.75/1 .00) 

- USNhel (0.75/1.00) 

- USNotI (0.75/1.00) 
10 - U$NsiI (0.75/1 .00) 

• U$PflMI (0.75/1.00) 

- U$PmeI (0.75/1.00) 

• USPolyAsig (0.75/1.00) 

• U$PstI (0.75/1 .00) 
15 • USSacI (0.75/1.00) 

• U$SacH (0.75/1 .00) 

• U$SalI (0.75/1.00) 

• USSbfl (0.75/1.00) 

• U$SfiI (0.75/1.00) 
20 - USSgfl (0.75/1 .00) 

• USSmal (0.75/1.00) 

• USSnaBI (0.75/1.00) 

• U$SpeI (0.75/1.00) 



PCT/US2005/033218 



WO 2006/034061 



PCT/US2005/033218 



• U$Splice-A (0.75/Optimized) 

• U$Splice-D (0.75/Optimized) 
. U$XbaI (0.75/1.00) 

. U$XcmI(O.75/1.00) 

5 • USXhoI (0.75/1 .00) 

• ALL vertebrates.lib 

Strategy for removal of sequence motifs 

The undesired sequence motifs specified above were removed from the 
1 0 starting gene sequence by selecting alternate codons that allowed retention of the 
specified protein ajid flanking sequences. Alternate codons were selected in a 
way to conform to the overall codon selection strategy as much as possible. 



General steps: 

15 - Identify undesired sequence matches with Matlnspector using matrix family 
subset "neo+hyg" or "neo+hyg-EC" and with Modellnspector using default 
settings. 

- Identify possible replacement codons to remove undesired sequence matches 
with SequenceShaper (keep ORF). 

20 - Incorporate changes into a new version of the synthetic gene sequence and 
re-analyze with Matlnspector and Modellnspector. 
Specific steps: 

- First try to remove undesired sequence matches using subset "neo+hyg-EC" 
and SequenceShaper default remaining thresholds (0.70/Opt-0.20). 

25 - For sequence matches that cannot be removed with this approach use lower 
SequenceShaper remaining thresholds (e.g. 0.70/Opt-0.05). 

- For sequence matches that still cannot be removed, try different 
combinations of manually chosen replacement codons (especially if more 
than 3 base changes might be needed). If that introduces new sequence 
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matches, try to remove those using the steps above (a different starting 
sequence sometimes allows a different removal solution). 

- Use subset "neo+hyg" to check whether problematic E. coli sequence 
matches were introduced, and if so try to remove them using an analogous 
5 approach to that described above for non E. coli sequences. 

Use an analogous strategy for the flanking (non-ORF) sequences. 

Final check with subset <l pGI>4-072503" after change in flanking cloning sites 

After codon optimizing neo and hyg, hneo and hhyg were obtained. 

10 Regulatory sequences were removed from hneo and hhyg yielding hneo-lF and 
hhyg- IF (the corresponding sequences without flanking regions are SEQ ID Nos. 
38 and 30, respectively). Regulatory sequences were removed from hneo-lF and 
hhyg- IF yielding hneo-2F and hhyg-2F (the corresponding sequences without 
flanking regions are SEQ K> Nos. 39 and 42, respectively). Regulatory 

15 sequences were removed from hneo-2F and hhyg-2F yielding hneo-3F and hhyg- 
3F. Hneo-3F and hhyg-3F were further modified by altering 5' and 3' cloning 
sites yielding hneo-3FB and hhyg-3FB: 

hneo-3 (after 3rd round of sequence removal, subset neo+hyg) has the following 
20 sequence: 

CCACTCcGTGGCCACCA.TGATCGAaCAaGACGGCCTcCAtGCtGGCAGtC 
CCGCaGCtTGGGTcGAaCGCtTGTTCGGgTACGACTGGGCCCAGCAGAC 
CATCGGaTGtAGCGAtGCgGCCGTGTTCCGtCTaAGCGCtCAaGGCCGgCC 
CGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGAGCTtCA 

25 aGACGAGGCtGCCCGCCTGAGCTGGCTGGCCACCACCGGtGTaCCCTGC 
GCCGCtGTGtTGGAtGTtGTGACCGAaGCCGGCCGgGACTGGCTGCTGCT 
GGGCGAGGTcCCtGGCCAGGAtCTGCTGAGCAGCCACCTtGCCCCCGCt 
GAGAAGGTttcCATCATGGCCGAtGCaATGCGgCGCCTGCACACCCTGG 
ACCCCGCtACaTGCCCCTTCGACCACCAGGCtAAGCAtCGgATCGAGCGt 

30 GCtCGgACCCGCATGGAGGCCGGCCTGGTGGACCAGGACGACCTGGA 
CGAGGAGCAtCAGGGCCTGGCCCCCGCtGAaCTGTTCGCCCGCCTGAAa 
GCCCGCATGCCgGACGGtGAGGACCTGGTtGTGACaCAtGGtGAtGCCTG 
CCTcCC^AACATCATGGTcGAGAAtGGcCGCTTCtcCGGCTTCATCGACTG 
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CGGtCGCCTaGGaGTtGCCGACCGCTACCAGGACATCGCCCTGGCCACC 
CGCGACATCGCtGAGGAGCTtGGCGGCGAGTGGGCCGACCGCTTCtTaG 
TctTGTACGGCATCGCaGCtCCCGACAGCCAGCGCATCGCCTTCTACCG 
CCTGCTcGACGAGTTCTTtTAATGACCAGgCTCTGG (SEQ ID NO:4); 

5 hneo-3FB (change PflML sites to Sbfi at 5' end and AflR at 3' end) has the 
following sequence: 

cctgcaggCCACCATGATCGAACAAGACGGCCTCCATGCTGGCAGTCCCG 
CAGCTTGGGTCGAACGCTTGTTCGGGTACGACTGGGCCCAGCAGACC 
ATCGGATGTAGCGATGCGGCCGTGTTCCGTCTAAGCGCTCAAGGCCG 

10 GCCCGTGCTGTTCGTGAAGA.CCGACCTGAGCGGCGCCCTGAACGAGC 
TTCAAGACGAGGCTGCCCGCCTGAGCTGGCTGGCCACCACCGGTGTA 
CCCTGCGCCGCTGTGTTGGAvTGTTGTGACCGAAGCCGGCCGGGACTG 
GCTGCTGCTGGGCGAGGTCCCTGGCCAGGATCTGCTGAGCAGCCACC 
TTGCCCCCGCTGAGAAGGTTTCCATCATGGCCGATGCAATGCGGCGC 

15 CTGCACACCCTGGACCCCGCTACATGCCCCTTCGACCACCAGGCTAA 
GCATCGGATCGAGCGTGCTCGGACCCGCATGGAGGCCGGCCTGGTGG 
ACCAGGACGACCTGGACGAGGAGCATCAGGGCCTGGCCCCCGCTGA 
ACTGTTCGCCCGCCTGAAAGCCCGCATGCCGGACGGTGAGGACCTGG 
TTGTGACACATGGTGATGCCTGCCTCCCTAACATCATGGTCGAGAAT 

20 GGCCGCTTCTCCGGCTTCArCGACTGCGGTCGCCTAGGAGTTGCCGAC 
CGCTACCAGGACATCGCCCTGGCCACCCGCGACATCGCTGAGGAGCT 
TGGCGGCGAGTGGGCCGACCGCTTCTTAGTCTTGTACGGCATCGCAG 
CTCCCGACAGCCAGCGCATCGCCTTCTACCGCCTGCTCGACGAGTTCT 
TTTAATGAgcttaag (SEQ ID NO:5); 

25 hhyg-3 (after 3rd round of sequence removal, subset neo+hyg) has the following 
sequence: 

CCACTCcGTGGCCACCATGAAGAAGCCCGAGCTGACCGCtACCAGCGT 
tGAaAAaTTtCTcATCGAGAAGTTCGACAGtGTGAGCGACCTGATGCAGt 
TgtcgGAGGGCGAaGAgAGCCGaGCCTTCAGCTTCGAtGTcGGCGGaCGC 
30 GGCTAtGTaCTGCGgGTGAAtAGCTGCGCtGAtGGCTTCTACAAaGACCG 
CTACGTGTACCGCCACTTCGCCAGCGCtGCaCTaCCCATCCCCGAaGTGt 
TGGACATCGGCGAGTTCAGCGAGAGCCTGACaTACTGCATCAGtaGaCG 
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CGCCCAaGGCGTtACtCTcCAaGACCTcCCCGAaACaGAGCTGCCtGCtGT 
GtTaCAGCCtGTcGCCGAaGCtATGGAtGCtATtGCCGCCGCCGACCTcAGt 
CAaACCAGCGGCTTCGGCCCaTTCGGgCCCCAaGGCATCGGCCAGTAC 
ACaACCTGGCGgGAtTTCATtTGCGCCATtGCtGAtCCCCAtGTcTACCACT 

5 GGCAGACCGTGATGGACGACA-CCGTGtcCGCCAGCGTaGCtCAaGCCCT 
GGACGAaCTGATGCTGTGGGCCGAaGACTGtCCCGAGGTGCGCCAcCTc 
GTcCAtGCCGACTTCGGCAGCAACAACGTcCTGACCGACAACGGCCGC 
ATCACCGCCGTaATCGACTGGtcCGAaGCtATGTTCGGgGACAGtCAGTA 
CGAGGTGGCCAACATCTTCTTCTGGCGgCCCTGGCTGGCtTGCATGGA 

10 GCAGCAGACtCGCTACTTCGAGCGCCGgCAtCCCGAGCTGGCCGGCAG 
CCCtCGtCTGCGaGCCTACATGCTGCGCATCGGCCTGGAtCAGCTcTACC 
AGAGCCTcGTGGACGGCAACTTCGACGAtGCtGCCTGGGCtCAaGGCCG 
CTGCGAtGCCATCGTcCGCAGCGGgGCCGGCACCGTcGGtCGCACaCAaA 
TCGCtCGCCGgAGCGCCGCCGTaTGGACCGACGGCTGCGTcGAGGTGCT 

15 GGCCGACAGCGGCAACCGCCGrgCCCAGtACaCGaCCgCGCGCtAAGGAG 
TAgTAACCAGgctcTGG (SEQ ID T$0:9); and 

hhyg-3FB (change P/IMl sites to Sbfl at 5' end and AftU. at 3' end) has the 
following sequence: 

cctgcaggCCACCATGAAGAAGCCCGAGCTGACCGCTACCAGCGTTGAAA 
20 AATTTCTCATCGAGAAGTTCGACAGTGTGAGCGACCTGATGCAGTTG 
TCGGAGGGCGAAGAGAGCCGAGCCTTCAGCTTCGATGTCGGCGGACG 
CGGCTATGTACTGCGGGTGAATAGCTGCGCTGATGGCTTCTACAAAG 
ACCGCTACGTGTACCGCCACTTCGCCAGCGCTGCACTACCCATCCCC 
GAAGTGTTGGACATCGGCGAGrTTCAGCGAGAGCCTGACATACTGCAT 
25 CAGTAGACGCGCCCAAGGCGTTACTCTCCAAGACCTCCCCGAAACAG 
AGCTGCCTGCTGTGTTACAGCCTGTCGCCGAAGCTATGGATGCTATTG 
CCGCCGCCGACCTCAGTCAAA.CCAGCGGCTTCGGCCCATTCGGGCCC 
CAAGGCATCGGCCAGTACACAlACCTGGCGGGATTTCATTTGCGCCAT 
TGCTGATCCCCATGTCTACCACTGGCAGACCGTGATGGACGACACCG 
30 TGTCCGCCAGCGTAGCTCAAGCCCTGGACGAACTGATGCTGTGGGCC 
GAAGACTGTCCCGAGGTGCGCCACCTCGTCCATGCCGACTTCGGCAG 
CAACAACGTCCTGACCGACAA.CGGCCGCATCACCGCCGTAATCGACT 
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GGTCCGAAGCTATGTTCGGGGACAGTCAGTACGAGGTGGCCAACATC 
TTCTTCTGGCGGCCCTGGCTGGCTTGCATGGAGCAGCAGACTCGCTAC 
TTCGAGCGCCGGCATCCCGAGCTGGCCGGCAGCCCTCGTCTGCGAGC 
CTACATGCTGCGCATCGGCCTGGATCAGCTCTACCAGAGCCTCGTGG 
5 ACGGCAACTTCGACGATGCTGCCTGGGCTCAAGGCCGCTGCGATGCC 
ATCGTCCGCAGCGGGGCCGGCA.CCGTCGGTCGCACACAAATCGCTCG 
CCGGAGCGCCGCCGTATGGACCGACGGCTGCGTCGAGGTGCTGGCCG 
ACAGCGGCAACCGCCGGCCCAGTACACGACCGCGCGCTAAGGAGTA 
GTAActtaag (SEQ ID NO: 10). 

10 Analysis of hneo-3FB and hhvg-3FB 

hneo-3FB had no transcription factor binding sequence, including 
promoter module, matches (GEMS release 3.5.2 June 2003; vertebrate TF 
binding sequence families (core similarity: 0.75 / matrix similarity: opt); and 
promoter modules (default parameters: optimized threshold or 80% of maximum 

1 5 score)), while hhyg-3FB had 4 transcription factor binding sequence matches 
remaining but no promoter modules (Table 1 0). The following transcription 
factor binding sequences were found in hhyg-3FB: 
n VSMINI 

Family: Muscle Initiators (2 members) 
20 Best match: Muscle Initiator Sequence 1 

Ref: Laura L. Lopez & James W. Fickett "Muscle-Specific Regulation of 
Transcription: A Catalog of Regulatory Elements" 
http://www.cbil.upenn.edu/MTIR/HoinePage.htinl 



25 Position in ORF: -7 to 1 1 

2) V$PAX5 

Family: PAX-5/PAX-9 B-cell-specific activating proteins (4 members) 
Best match: B-cell-specific activating protein 
Ref: MEDLINE 94010299 
30 Position in ORF: 271 to 299 

3) VSAREB 

Family: Atplal regulatory element binding (4 members) 
Best match: AREB6 
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Ref: MEDLINE 96061934 

Position in ORF: 310 to 322 

4) VSVMYB 

Family: AMV-viral myb oncogene (2 members) 
5 Best match: v-Myb 

Ref: MEDLINE 94147510 

Position in ORF: 619 to 629 

Other sequences remaining in hneo-3F included one E. coli RBS 8 bases 
10 upstream of Met (ORF position 334 to 337); hneo-3FB included a splice 

acceptor site (+) and Pstl site as part of a 5' cloning site for Sbfl, and one E. coli 
RBS 8 bases upstream of Met (ORF position. 334 to 337); hhyg-3F had no other 
sequence matches; and hhyg-3FB included a splice acceptor site (+) and Pstl site 
as part of a 5' cloning site for Sbfl. 
1 5 Subsequently, regulatory sequences were removed from hneo-3F and 

hhyg-3F yielding hneo-4 and hhyg-4. Then regulatory sequences were removed 
from hneo-4 yielding hneo-5. 



Table 14 



mM 




mmmm 


Neo 


- / 53 / - 


- 1 o / - 


hneo-F 


1/61/2 


0/2/0 


hneo-3F 


0/0/0 


0/0/0 


hneo-3FB 


0/0/0 


0/0/0 


Hyg 


-- / 74 / - 


- / 3 / - 


hhyg-F 


1/94/1 


0/4/0 


hhyg-3F 


1/3/0 


0/0/0 


hhyg-3FB 


1/3/0 


0/0/0 



20 

♦Promoter modules are defined as a composite regulatory element, with 2 
transcription factor binding sites separated by a spacer, which has been shown to 
exhibit synergistic or antagonistic function. 
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Table 15 summarizes the identity of various genes. 

Table 15 

Pairwise identity of different gene versions 
5 Comparisons were of open reading frames (ORFs). 
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An expression cassette (hNeo-cassette) with a synthetic neomycin gene flanked 
by a SV40 promoter and a synthetic poly(A) site is shown below. 
GGATCCGTTTGCGTATTGGGCGCTCTTCCGCTGATCTGCGCAGCACCA 
TGGCCTGAAATAACCTCTGAAAGAGGAACrTGGTTAGCTACCTTCTG 
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AGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAA 
AGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTC 
AATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGG 
CAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCC 

5 CGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCC 
ATTCTCCGCCCCA.TGGCTGACTAATTTTTTTTATTTATGCAGAGGCCG 
AGGCCGCCTCTGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTT 
TTTGGAGGCCTAGGCTTTTGCAAAAAGCTCGATTCTTCTGACACTAGC 
GCCACCATGATCGAACAAGACGGCCTCCATGCTGGCAGTCCCGCAGC 

10 TTGGGTCGAACGCTTGTTCGGGTACGACTGGGCCCAGCAGACCATCG 
GATGTAGCGATGCGGCCGTGTTCCGTCTAAGCGCTCAAGGCCGGCCC 
GTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGAGCTTCA 
AGACGAGGCTGCCCGCCTGAGCTGGCTGGCCACCACCGGCGTACCCT 
GCGCCGCTGTGTTGGATGTTGTGACCGAAGCCGGCCGGGACTGGCTG 

15 CTGCTGGGCGAGGTCCCTGGCCAGGATCTGCTGAGCAGCCACCTTGC 
CCCCGCTGAGAAGGTTTCTATCATGGCCGATGCAATGCGGCGCCTGC 
ACACCCTGGACCCCGCTACCTGCCCCTTCGACCACCAGGCTAAGCAT 
CGGATCGAGCGTGCTCGGACCCGCATGGAGGCCGGCCTGGTGGACCA 
GGACGACCTGGA.CGAGGAGCATCAGGGCCTGGCCCCCGCTGAACTGT 

20 TCGCCCGACTGAAAGCCCGCATGCCGGACGGTGAGGACCTGGTTGTC 
ACACACGGAGATGCCTGCCTCCCTAACATCATGGTCGAGAATGGCCG 
CTTCTCCGGCTTCATCGACTGCGGTCGCCTAGGAGTTGCCGACCGCTA 
CCAGGACATCGCCCTGGCCACCCGCGACATCGCTGAGGAGCTTGGCG 
GCGAGTGGGCCG-ACCGCTTCTTAGTCTTGTACGGCATCGCAGCTCCC 

25 GACAGCCAGCGCATCGCCTTCTACCGCTTGCTCGACGAGTTCTTTTAA 
TGATCTAGAACCGGTCATGGCCGCAATAAAATATCTTTATTTTCATTA 
CATCTGTGTGTTGGTTTTTTGTGTGTTCGAACTAGATGCTGTCGAC 
(SEQ ID NO:44). 

30 An expression cassette (hPuro-cassette) with a synthetic puromycin gene flanked 
by a SV40 promoter and a synthetic poly(A) site is shown below. 
GGATCCGTTTGCGTATTGGGCGCTCTTCCGCTGATCTGCGCAGCACCA 
TGGCCTGAAATAj\CCTCTGAAAGAGGAACTTGGTTAGCTACCTTCTG 
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AGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAA 
AGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTC 
AATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGG 
CAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAAvCCATAGTCC 

5 CGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCC 
ATTCTCCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCG 
AGGCCGCCTCTGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTT 
TTTGGAGGCCTAGGCTTTTGCAAAAAGCTCGATTCTTCTGACACTAGC 
GCCACCATGACCGAGTACAAGCCTACCGTGCGCCTGGCCACTCGCGA 

10 TGATGTGCCCCGCGCCGTCCGCACTCTGGCCGCCGCTTTCGCCGACTA 
CCCCGCTACCCGGCACACCGTGGACCCCGACCGGCACATCGAGCGTG 
TGACAGAGTTGCAGGAGCTGTTCCTGACCCGCGTCGGGCTGGACATC 
GGCAAGGTGTGGGTAGCCGACGACGGCGCGGCCGTGGCCGTGTGGA 
CTACCCCCGAGAGCGTTGAGGCCGGCGCCGTGTTCGCCGAGATCGGC 

15 CCCCGAATGGCCGAGCTGAGCGGCAGCCGCCTGGCCGCCCAGCAGCA 
AATGGAGGGCCTGCTTGCCCCCCATCGTCCCAAGGAGCCTGCCTGGT 
TTCTGGCCACTGTAGGAGTGAGCCCCGACCACCAGGGCAAGGGCTTG 
GGCAGCGCCGTCGTGTTGCCCGGCGTAGAGGCCGCCGAACGCGCCGG 
TGTGCCCGCCTTTCTCGAAACAAGCGCACCAAGAAACCTTCCATTCTA 

20 CGAGCGCCTGGGCTTCACCGTGACCGCCGATGTCGAGGTGCCCGAGG 
GACCTAGGACCTGGTGTATGACACGAAAACCTGGCGCCTAATGATCT 
AGAACCGGTCATGGCCGCAATAAAATATCTTTATTTTCATTACATCTG 
TGTGTTGGTTTTTTGTGTGTTCGAACTAGATGCTGTCGA.C (SEQ ID 
NO: 11); 

25 

hpuro: 

GCTAGCGCCACCATGACCGAGTACAAGCCCACCGTGCCjCCTGGCCAC 
CCGCGACGACGTGCCCCGCGCCGTGCGCACCCTGGCCGCCGCCTTCG 
CCGACTACCCCGCCACCCGCCACACCGTGGACCCCGACCGCCACATC 
30 GAGCGCGTGACCGAGCTGCAGGAGCTGTTCCTGACCCGCGTGGGCCT 
GGACATCGGCAAGGTGTGGGTGGCCGACGACGGCGCCGCCGTGGCC 
GTGTGGACCACCCCCGAGAGCGTGGAGGCCGGCGCCGrTGTTCGCCGA 
GATCGGCCCCCGCATGGCCGAGCTGAGCGGCAGCCGCCTGGCCGCCC 
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AGCAGCAGATGGAGGGCCTGCTGGCCCCCCACCGCCCCAAGGAGCCC 
GCCTGGTTCCTGGCCACCGTGGGCGTGAGCCCCGACCACCAGGGCAA 
GGGCCTGGGCAGCGCCGTGGTGCTGCCCGGCGTGGAGGCCGCCGAGC 
GCGCCGGCGTGCCCGCCTTCCTGGAGACCAGCGCCCCCCGCAACCTG 
5 CCCTTCTACGAGCGCCTGGGCTTCACCGTGACCGCCGACGTGGAGGT 
GCCCGAGGGCCCCCGCACCTGGTGCATGACCCGCAAGCCCGGCGCCT 
AATGATCTAGA (SEQ ID NO:91); 

hpuro-l: 

10 gctagcgccaccatgaccgagtacaagcctaccgtgcgcctggccactcgcgatgatgtgccccgcgccgtccgc 
actctggccgccgctttcgccgactaccccgctacccggcacaccgtggaccccgaccggcacatcgagcgtgtg 
acagagttgcaggagctgttcctgacccgcgtcgggctggacatcggcaaggtgtgggtagccgacgacggcgc 
ggccgtggccgtgtggactacccccgagagcgttgaggccggcgccgtgttcgccgagatcggcccccgaatgg 
ccgagctgagcggcagccgcctggccgcccagcagcaaatggagggcctgcttgccccccatcgtcccaaggag 

15 cccgcctggtttctggccactgtaggagtgagccccgaccaccagggcaagggcttgggcagcgccgtcgtgttg 
cccggcgtagaggccgccgaacgcgccggtgtgcccgcctttctggagacaagcgctccgcgtaaccttccattct 
acgagcgcctgggcttcaccgtgaccgccgatgtcgaggtgcccgagggaccccggacctggtgcatgactcgc 
aagcctggcgcctaatgatctaga (SEQ ID NO:92); and 

20 hpuro-2 

GCTAGCGCCACCATGACCGAGTACAAGCCTACCGTGCGCCTGGCCAC 
TCGCGATGATGTGCCCCGCGCCGTCCGCACTCTGGCCGCCGCTTTCGC 
CGACTACCCCGCTACCCGGCACACCGTGGACCCCGACCGGCACATCG 
AGCGTGTGACAGAGTTGCAGGAGCTGTTCCTGACCCGCGTCGGGCTG 

25 GACATCGGCAAGGTGTGGGTAGCCGACGACGGCGCGGCCGTGGCCG 
TGTGGACTACCCCCGAGAGCGTTGAGGCCGGCGCCGTGTTCGCCGAG 
ATCGGCCCCCGAATGGCCGAGCTGAGCGGCAGCCGCCTGGCCGCCCA 
GCAGCAAATGGAGGGCCTGCTTGCCCCCCATCGTCCCAAGGAGCCTG 
CCTGGTTTCTGGCCACTGTAGGAGTGAGCCCCGACCACCAGGGCAAG 

30 GGCTTGGGCAGCGCCGTCGTGTTGCCCGGCGTAGAGGCCGCCGAACG 
CGCCGGTGTGCCCGCCTTTCTCGAAACAAGCGCACCAAGAAACCTTC 
CATTCTACGAGCGCCTGGGCTTCACCGTGACCGCCGATGTCGAGGTG 
CCCGAGGGACCTAGGACCTGGTGTATGACACGAAAACCTGGCGCCTA 
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ATGATCTAGA (SEQ ED NO:93). 

The starting puro sequence (from psi STRIKE) has SEQ ID NO: 15 
(atgaccgagt acaagcccac ggtgcgcctc gccacccgcg acgacgtccc ccgggccgta 

5 cgcaccctcg ccgccgcgtt cgccgactac cccgccacgc gccacaccgt cgacccggac 
cgccacatcg agcgggtcac cgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac 
atcggcaagg tgtgggtcgc ggacgacggc gccgcggtgg cggtctggac cacgccggag 
agcgtcgaag cgggggcggt gttcgccgag atcggcccgc gcatggccga gttgagcggt 
tcccggctgg ccgcgcagca acagatggaa ggcctcctgg cgccgcaccg gcccaaggag 

10 cccgcgtggt tcctggccac cgtcggcgtg tcgcccgacc accagggcaa gggtctgggc 
agcgccgtcg tgctccccgg agtggaggcg gccgagcgcg ccggggtgcc cgccttcctg 
gagacctccg cgccccgcaa cctccccttc tacgagcggc tcggcttcac cgtcaccgcc 
gacgtcgagg tgcccgaagg accgcgcacc tggtgcatga cccgcaagcc cggtgcc). 

15 Other synthetic hyg and neo genes include 
hneo-1 : 

CCACTCAGTGGCCACCATGATCGAGCAGGACGGCCTcCAtGCtGGCAGt 

CCCGCaGCCTGGGTcGAGCGCtTGTTCGGgTACGACTGGGCCCAGCAG 

ACCATCGGaTGtAGCGAtGCCGCaGTGTTCCGCCTGAGCGCtCAaGGCCG 

20 gCCCGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGAGC 
TtCAaGACGAGGCtGCCCGCCTGAGCTGGCTGGCCACCACCGGtGTaCC 
CTGCGCCGCtGTGtTGGAtGTtGTGACCGAaGCCGGCCGCGACTGGCTGC 
TGCTGGGCGAGGTGCCtGGCCAGGACCTGCTGAGCAGCCACCTGGCC 
CCCGCtGAGAAGGTGAGCATCATGGCCGACGCCATGCGgCGCCTGCAC 

25 ACCCTGGACCCCGCtACaTGCCCCTTCGACCACCAGGCtAAGCACCGC 
ATCGAGCGgGCtCGgACCCGCATGGAGGCCGGCCTGGTGGACCAGGAC 
GACCTGGACGAGGAGCACCAGGGCCTGGCCCCCGCtGAaCTGTTCGCC 
CGCCTGAAaGCCCGCATGCCgGACGGtGAGGACCTGGTtGTGACaCACG 
GCGACGCCTGCCTcCCtAACATCATGGTcGAGAACGGgCGCTTCtcCGGC 

30 TTCATCGACTGCGGCCGCCTGGGCGTtGCCGACCGCTACCAGGACATC 
GCCCTGGCCACCCGCGACATCGCCGAGGAGCTGGGCGGCGAGTGGG 
CCGACCGCTTCCTGGTctTGTACGGCATCGCaGCtCCCGACAGCCAGCG 
CATCGCCTTCTACCGCCTGCTGGACGAGTTCrrCTAgTAACCAGgCTCT 
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GG(SEQIDNO:38); 
hneo-2 

CCACTCcGTGGCCACCATGATCGAaCAaGACGGCCTcCAtGCtGGCAGtC 
5 CCGCaGCtTGGGTcGAaCGCtTGTTCGGgTACGACTGGGCCCAGCAGAC 
CATCGGaTGtAGCGAtGCgGCCGTGTTCCGtCTaAGCGCtCAaGGCCGgCC 
CGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGAGCTtCA 
aGACGAGGCtGCCCGCCTGAGCTGGCTGGCCACCACCGGtGTaCCCTGC 
GCCGCtGTGtTGGAtGTtGTGACCGAaGCCGGCCGgGACTGGCTGCTGCT 

10 GGGCGAGGTcCCtGGCCAGGAtCTGCTGAGCAGCCACCTtGCCCCCGCt 
GAGAAGGTttcCATCATGGCCGAtGCaATGCGgCGCCTGCACACCCTGG 
ACCCCGCtACaTGCCCCTTCGACCACCAGGCtAAGCAtCGgATCGAGCGt 
GCtCGgACCCGCATGGAGGCCGGCCTGGTGGACCAGGACGACCTGGA 
CGAGGAGCAtCAGGGCCTGGCCCCCGCtGAaCTGTTCGCCCGCCTGAAa 

1 5 GCCCGC ATGCCgGACGGtGAGGACCTGGTtGTGACaCAtGGaGAtGCCTG 
CCTcCCtAACATCATGGTcGAGAAtGGcCGCTTCtcCGGCTTCATCGACTG 
CGGtCGCCTaGGaGTtGCCGACCGCTACCAGGACATCGCCCTGGCCACC 
CGCGACATCGCtGAGGAGCTtGGCGGCGAGTGGGCCGACCGCTTCtTaG 
TctTGTACGGCATCGCaGCtCCCGACAGCCAGCGCATCGCCTTCTACCG 

20 CCTGCTcGACGAGTTCTTtTAATGACCAGgCTCTGG (SEQ ID NO:39); 
hhyg-1 

CCACTCAGTGGCCACCATGAAGAAGCCCGAGCTGACCGCTACCAGCG 
TTGAGAAGTTCCTGATCGAGAAGTTCGACAGCGTGAGCGACCTGATG 
CAGTTAAGCGAGGGCGAGGAAAGCCGCGCCTTCAGCTTCGATGTCGG 

25 CGGACGCGGCTATGTACTGCGGGTGAATAGCTGCGCTGATGGCTTCT 
ACAAAGACCGCTACGTGTACCGCCACTTCGCCAGCGCTGCACTGCCC 
ATCCCCGAGGTGCTGGACATCGGCGAGTTCAGCGAGAGCCTGACATA 
CTGCATCAGCCGCCGCGCTCAAGGCGTGACTCTCCAAGACCTGCCCG 
AGACAGAGCTGCCCGCTGTGCTACAGCCTGTCGCCGAGGCTATGGAC 

30 GCTATTGCCGCCGCCGACCTGAGCCAGACCAGCGGCTTCGGCCCATT 
CGGGCCCCAAGGCATCGGCCAGTACACCACCTGGCGCGACTTCATCT 
GCGCCATTGCTGATCCCCATGTCTACCACTGGCAGACCGTGATGGAC 
GACACCGTGAGCGCCAGCGTAGCTCAAGCCCTGGACGAGCTGATGCT 
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GTGGGCCGAGGACTGCCCCGAGGTGCGCCATCTCGTCCATGCCGACT 
TCGGCAGCAACAACGTCCTGACCGACAACGGCCGCATCACCGCCGTA 
ATCGACTGGAGCGAGGCCATGTTGGGGGACAGTCAGTACGAGGTGGC 
CAACATCTTCTTCTGGCGGCCCTGGCTGGCCTGCATGGAGCAGCAAA 
5 CCCGCTACTTCGAGCGCCGCCATCCCGAGCTGGCCGGCAGCCCCCGT 
CTGCGAGCCTACATGCTGCGCATCGGCCTGGATCAGCTCTACCAGAG 
CCTCGTGGACGGCAACTTCGA.CGATGCTGCCTGGGCTCAAGGCCGCT 
GCGATGCCATCGTCCGCAGCGGGGCCGGCACCGTCGGTCGCACACAA 
ATCGCTCGCCGGAGCGCCGCCGTATGGACCGACGGCTGCGTCGAGGT 
10 GCTGGCCGACAGCGGCAACCGCCGGCCCAGTACACGACCGCGCGCTA 
AGGAGTAGTAACCAGCTCTTGG (SEQ ID NO:30); 

hhyg-2: 

CCACTCCGTGGCCACCATGAA.GAAGCCCGAGCTGACCGCTACCAGCG 

1 5 TTGAAAAATTTCTC ATCGAGAAGTTCGACAGTGTGAGCGACCTGATG 
CAGTTGTCGGAGGGCGAAGAGAGCCGAGCCTTCAGCTTCGATGTCGG 
CGGACGCGGCTATGTACTGCGGGTGAATAGCTGCGCTGATGGCTTCT 
ACAAAGACCGCTACGTGTACCGCCACTTCGCCAGCGCTGCACTACCC 
ATCCCCGAAGTGTTGGACATCGGCGAGTTCAGCGAGAGCCTGACATA 

20 CTGCATCAGTAGACGCGCCCAAGGCGTTACTCTCCAAGACCTCCCCG 
AAACAGAGCTGCCTGCTGTGTTACAGCCTGTCGCCGAAGCTATGGAT 
GCTATTGCCGCCGCCGACCTCAGTCAAACCAGCGGCTTCGGCCCATT 
CGGGCCCCAAGGCATCGGCCAGTACACAACCTGGCGGGATTTCATTT 
GCGCCATTGCTGATCCCCATGrTCTACCACTGGCAGACCGTGATGGAC 

25 GACACCGTGTCCGCCAGCGTAGCTCAAGCCCTGGACGAACTGATGCT 
GTGGGCCGAAGACTGTCCCGAGGTGCGCCACCTCGTCCATGCCGACT 
TCGGCAGCAACAACGTCCTGACCGACAACGGCCGCATCACCGCCGTA 
ATCGACTGGAGCGAGGCTATGTTCGGGGACAGTCAGTACGAGGTGGC 
CAACATCTTCTTCTGGCGGCCCTGGCTGGCTTGCATGGAGCAGCAGA 

30 CTCGCTACTTCGAGCGCCGGCATCCCGAGCTGGCCGGCAGCCCTCGT 
CTGCGAGCCTACATGCTGCGCATCGGCCTGGATCAGCTCTACCAGAG 
CCTCGTGGACGGCAACTTCGACGATGCTGCCTGGGCTCAAGGCCGCT 
GCGATGCCATCGTCCGCAGCGGGGCCGGCACCGTCGGTCGCACACAA 
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ATCGCTCGCCGGAGCGCCGCCGTATGGACCGACGGCTGCGTCGAGGT 
GCTGGCCGACAGCGGCAACCGCCGGCCCAGTACACGACCGCGCGCTA 
AGGAGTAGTAACCAGCTCTTGG (SEQ ID NO:42); 

5 hHygro (Sacl site in ORF near 5' end, insert in-frame linker coding for 12 amino 
acids at 3' end, and SnaBI site added at 3' end in ORF) 

aagcttgctagcgccaccatgaagaagcccgagctcaccgctaccagcgttgaaaaatttctcatcgagaagttcga 
cagtgtgagcgacctgatgcagttgtcggagggcgaagagagccgagccttcagcttcgatgtcggcggacgcgg 
ctatgtactgcgggtgaatagctgcgctgatggcttctacaaagaccgctacgtgtaccgccacttcgccagcgctgc 

10 actacccatccccgaagtgttggacatcggcgagttcagcgagagcctgacatactgcatcagtagacgcgcccaa 
ggcgttactctccaagacctccccgaaacagagctgcctgctgtgttacagcctgtcgccgaagctatggatgctatt 
gccgccgccgacctcagtcaaaccagcggcttcggcccattcgggccccaaggcatcggccagtacacaacctg 
gcgggatttcatttgcgccattgctgatccccatgtctaccactggcagaccgtgatggacgacaccgtgtccgccag 
cgtagctcaagccctggacgaactgatgctgtgggccgaagactgtcccgaggtgcgccacctcgtccatgccgac 

15 ttcggcagcaacaacgtcctgaccgacaacggccgcatcaccgccgtaatcgactggtccgaagctatgttcgggg 
acagtcagtacgaggtggccaacatcttcttctggcggccctggctggcttgcatggagcagcagactcgctacttc 
gagcgccggcatcccgagctggccggcagccctcgtctgcgagcctacatgctgcgcatcggcctggatcagctc 
taccagagcctcgtggacggcaacttcgacgatgctgcctgggctcaaggccgctgcgatgccatcgtccgcagc 
ggggccggcaccgtcggtcgcacacaaatcgctcgccggagcgccgccgtatggaccgacggctgcgtcgaggt 

20 gctggccgacagcggcaaccgccggcccagtacacgaccgcgcgctaaggagggtggcggagggagcggtgg 
cggaggttcctacgtatagtctagactcgag (SEQ ID NO:70); 

hhyg-4 

atgaagaagcccgagctcaccgctaccagcgttgaaaaatttctcatcgagaagttcgacagtgtgagcgacctgat 
25 gcagttgtcggagggcgaagagagccgagccttcagcttcgatgtcggcggacgcggctatgtactgcgggtgaa 
tagctgcgctgatggcttctacaaagaccgctacgtgtaccgccacttcgccagcgctgcactacccatccccgaag 
tgttggacatcggcgagttcagcgagagcctgacatactgcatcagtagacgcgcccaaggcgttactctccaaga 
cctccccgaaacagagctgcctgctgtgttacagcctgtcgccgaagctatggatgctattgccgccgccgacctca 
gtcaaaccagcggcttcggcccattcgggccccaaggcatcggccagtacacaacctggcgggatttcatttgcgc 
30 cattgctgatccccatgtctaccactggcagaccgtgatggacgacaccgtgtccgccagcgtagctcaagccctgg 
acgaactgatgctgtgggccgaagactgtcccgaggtgcgccacctcgtccatgccgacttcggcagcaacaacgt 
cctgaccgacaacggccgcatcaccgccgtaatcgactggtccgaagctatgttcggggacagtcagtacgaggtg 
gccaacatcttcttctggcggccctggctggcttgcatggagcagcagactcgctacttcgagcgccggcatcccga 
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gctggccggcagccctcgtctgcgagcctacatgctgcgcatcggcctggatcagctctaccagagcctcgtggac 
ggcaacttcgacgatgctgcctgggctcaaggccgctgcgatgccatcgtccgcagcggggccggcaccgtcggt 
cgcacacaaatcgctcgccggagcgcagccgtatggaccgacggctgcgtcgaggtgctggccgacagcggca 
accgccggcccagtacacgaccgcgcgctaaggaaggcggtggaggtagtggtggcggaggtagctacgta 
5 (SEQIDN0:71); 

hneo-4: 

GCTAGCGCCACCATGATCGAACAAGACGGCCTCCATGCTGGCAGTCC 
CGCAGCTTGGGTCGAACGCTTGTTCGGGTACGACTGGGCCCAGCAGA 

10 CCATCGGATGTAGCGATGCGGCCGTGTTCCGTCTAAGCGCTCAAGGC 
CGGCCCGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGA 
GCTTCAAGACGAGGCTGCCCGCCTGAGCTGGCTGGCCACCACCGGTG 
TACCCTGCGCCGCTGTGTTGGATGTTGTGACCGAAGCCGGCCGGGAC 
TGGCTGCTGCTGGGCGAGGTCCCTGGCCAGGATCTGCTGAGCAGCCA 

15 CCTTGCCCCCGCTGAGAAGGTTTCCATCATGGCCGATGCAATGCGGC 
GCCTGCACACCCTGGACCCCGCTACATGCCCCTTCGACCACCAGGCT 
AAGCATCGGATCGAGCGTGCTCGGACCCGCATGGAGGCCGGCCTGGT 
GGACCAGGACGACCTGGACGAGGAGCATCAGGGCCTGGCCCCCGCT 
GAACTGTTCGCCCGCCTGAAAGCCCGCATGCCGGACGGTGAGGACCT 

20 GGTTGTGACACATGGTGATGCCTGCCTCCCTAACATCATGGTCGAGA 
ATGGCCGCTTCTCCGGCTTCATCGACTGCGGTCGCCTAGGAGTTGCCG 
ACCGCTACCAGGACATCGCCCTGGCCACCCGCGACATCGCTGAGGAG 
CTTGGCGGCGAGTGGGCCGACCGCTTCTTAGTCTTGTACGGCATCGC 
AGCTCCCGACAGCCAGCGCATCGCCTTCTACCGCCTGCTCGACGAGT 

25 TCTTTTAATCTAGA 
(SEQE)NO:72); 
and 

hneo-5: 

GCTAGCGCCACCATGATCGAACAAGACGGCCTCCATGCTGGCAGTCC 
30 CGCAGCTTGGGTCGAACGCTTGTTCGGGTACGACTGGGCCCAGCAGA 
CCATCGGATGTAGCGATGCGGCCGTGTTCCGTCTAAGCGCTCAAGGC 
CGGCCCGTGCTGTTCGTGAAGACCGACCTGAGCGGCGCCCTGAACGA 
GCTTCAAGACGAGGCTGCCCGCCTGAGCTGGCTGGCCACCACCGGCG 
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TACCCTGCGCCGCTGTGTTGGATGTTGTGACCGAAGCCGGCCGGGAC 
TGGCTGCTGCTGGGCGAGGTCCCTGGCCAGGATCTGCTGAGCAGCCA 

CCHTGCCCCCGCTGAGAAGGTTTCT 

GCCTGCACACCCTGGACCCCGCTACCTGCCCCTTCGACCACCAGGCT 
5 AAGCATCGGATCGAGCGTGCTCGGACCCGCATGGAGGCCGGCCTGGT 
GGACCAGGACGACCTGGACGAGGAGCATCAGGGCCTGGCCCCCGCT 
GAACTGTTCGCCCGACTGAAAGCCCGCATGCCGGACGGTGAGGACCT 
GGTTGTCACACACGGAGATGCCTGCCTCCCTAACATCATGGTCGAGA 
ATGGCCGCTTCTCCGGCTTCATCGACTGCGGTCGCCTAGGAGTTGCCG 
10 ACCGCTACCAGGACATCGCCCTGGCCACCCGCGACATCGCTGAGGAG 
CTTGGCGGCGAGTGGGCCGACCGCTTCTTAGTCTTGTACGGCATCGC 
AGCTCCCGACAGCCAGCGCATCGCCTTCTACCGCTTGCTCGACGAGTT 

CTTTTAATGATCTAGA(SEQ ID NO:73). 

1 5 The synthetic nucleotide sequence of the invention may be employed in 

fusion constructs. For instance, a synthetic sequence for a selectable polypeptide 
may be fused to a wild-type sequence or to another synthetic sequence which . 
encodes a different polypeptide. For instance, the neo sequence in the following 
examples of a synthetic Renilla luciferase-weo sequence may be replaced with a 

20 synthetic neo sequence of the invention: 

atggcttccaaggtgtacgaccccgagcaacgcaaacgcatgatcactgggcctcagtggtgggctcgctgcaagc 
aaatgaacgtgctggactccttcatcaactactatgattccgagaagcacgccgagaacgccgtgatttttctgcatgg 
taacgctgcctccagctacctgtggaggcacgtcgtgcctcacatcgagcccgtggctagatgcatcatccctgatct 
gatcggaatgggtaagtccggcaagagcgggaatggctcatatcgcctcctggatcactacaagtacctcaccgctt 

25 ggttcgagctgctgaaccttccaaagaaaatcatctttgtgggccacgactggggggcttgtctggcctttcactactc 
ctacgagcaccaagacaagatcaaggccatcgtccatgctgagagtgtcgtggacgtgatcgagtcctgggacga 
gtggcctgacatcgaggaggatatcgccctgatcaagagcgaagagggcgagaaaatggtgcttgagaataacttc 
ttcgtcgagaccatgctcccaagcaagatcatgcggaaactggagcctgaggagttcgctgcctacctggagccatt 
caaggagaagggcgaggttagacggcctaccctctcctggcctcgcgagatccctctcgttaagggaggcaagcc 

30 cgacgtcgtccagattgtccgcaactacaacgcctaccttcgggccagcgacgatctgcctaagatgttcatcgagtc 
cgaccctgggttcttttccaacgctattgtcgagggagctaagaagttccctaacaccgagttcgtgaaggtgaaggg 
cctccacttcagccaggaggacgctccagatgaaatgggtaagtacatcaagagcttcgtggagcgcgtgctgaag 
aacgagcagaccggtggtgggagcggaggtggcggatcaggtggcggaggctccggagggattgaacaagatg 
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gattgcacgcaggttctccggccgcttgggtggagaggctattcggctatgactgggcacaacagacaatcggctg 
ctctgatgccgccgtgttccggctgtcagcgcaggggcgcccggttctttttgtcaagaccgacctgtc^ 
gaatgaactgcaggacgaggcagcgcggctatcgtggctggccacgacgggcgttccttgcgcagctgtgctcga 
cgttgtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcaggatctcctgtcatctcaccttg 
5 ctcctgccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttgatccggctacctgcccattcg 
accaccaagcgaaacatcgcatcgagcgagcacgtactcggatggaagccggtcttgtcgatcaggatgatctgga 
cgaagagcatcaggggctcgcgccagccgaactgttcgccaggctcaaggcgcgcatgcccgacggcgaggat 
ctcgtcgtgacccatggcgatgcctgcttgccgaatatcatggtggaaaatggccgcttttctggattcatcgact^ 
gccggctgggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattgctgaagagcttggcggcga 
10 atgggctgaccgcttcctcgtgctttacggtatcgccgctcccgattcgcagcgcatcgccttctatcgccttcttgacg 
agttcttctaa (hrl-neo fusion; SEQ ED NO: 12) 
and 

atgattgaacaagatggattgcacgcaggttctccggccgcttgggtggagaggctattcggctatgactgggcaca 
acagacaatcggctgctctgatgccgccgtgttccggctgtcagcgcaggggcgcccggttctttttgtcaagaccg 

15 acctgtccggtgccctgaatgaactgcaggacgaggcagcgcggctatcgtggctggccacgacgggcgttcctt 
gcgcagctgtgctcgacgttgtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcaggatc 
tcctgtcatctcaccttgctcctgccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttgatcc 
ggctacctgcccattcgaccaccaagcgaaacatcgcatcgagcgagcacgtactcggatggaagccggtcttgtc 
gatcaggatgatctggacgaagagcatcaggggctcgcgccagccgaactgttcgccaggctcaaggcgcgcat 

20 gcccgacggcgaggatctcgtcgtgacccatggcgatgcctgcttgccgaatatcatggtggaaaatggccgctttt 
ctggattcatcgactgtggccggctgggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattg^ 
aagagcttggcggcgaatgggctgaccgcttcctcgtgctttacggtatcgccgctcccgattcgcagcgcatcgcc 
ttctatcgccttcttgacgagttcttcaccggtggtgggagcggaggtggcggatcaggtggcggaggctccggag 
gggcttccaaggtgtacgaccccgagcaacgcaaacgcatgatcactgggcctcagtggtgggctcgctgcaagc 

25 aaatgaacgtgctggactcxttcatcaactacta^ 

taacgctgcctccagctacctgtggaggcacgtcgtgcctcacatcgagcccgtggctagatgcatcatccctgatct 
gatcggaatgggtaagtccggcaagagcgggaatggctcatatcgcctcctggatcactacaagtacctcaccgctt 
ggttcgagctgctgaaccttccaaagaaaatcatctttgtgggccacgactggggggcttgtctggcctttcactactc 
ctacgagcaccaagacaagatcaaggccatcgtccatgctgagagtgtcgtggacgtgatcgagtcctgggacga 

30 gtggcctgacatcgaggaggatatcgccctgatcaagagcgaagagggcgagaaaatggtgcttgagaataacttc 
ttcgtcgagaccatgctcccaagcaagatcatgcggaaactggagcctgaggagttcgctgcctacctggag^ 
caaggagaagggcgaggttagacggcctaccctctcctggcctcgcgagatccctctcgttaagggaggcaag 
cgacgtcgtccagattgtccgcaactacaacgcctaccttcgggccagcgacgatctgcctaagatgttcatcga^ 
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cgaccctgggttcttttccaacgctattgtcgagggagctaagaagttccctaacaccgagttcgtgaaggtgaaggg 
cctccacttcagwiaggaggacgctccagatgaaatgggtaagtacatcaagagcttcgtggagcgcgtgctga^ 
aacgagcagtaa (neo-hrl-fusion; SEQ ID NO: 13). 

5 Example 5 

Transcription Factor Binding Sites Used to Identify Sites 
in Selected Synthetic Sequences 

TF binding site libraries 

The TF binding site library ("Matrix Family Library") is part of the 

10 GEMS Launcher package. Table 16 shows the version of the Matrix Family 
Library which was used in the design of a particular sequence and Table 17 
shows a list of all vertebrate TF binding sites ("matrices") in Matrix Family 
Library Version 2.4, as well as all changes made to vertebrate matrices in later 
versions up to 4.1 (section "GENOMATIX MATRIX FAMILY LIBRARY 

15 INFORMATION Versions 2.4 to 4.1"). (Genomatix has a copyright to all 
Matrix Library Family information). 



Table 16 



Synthetic DNA sequence 


Genomatix Matrix Family 
Library 


PGL4B-NN3* 


Version 2.4 Ma-y 2002 


luc2A8 andluc2B10 


Version 3.0 Nov 2002 
Version 3.1.1 April 2003 


hhyg3 
hneo3 


Version 3.1.2 June 2003 


hhyg4 


Version 3.3 August 2003 


Spel-Ncol-Nerl ** 


Version 4.0 Nov 2003 


hneoS 
hpuro2 


Version 4.1 Feb 2004 



20 *Notl-Ncol fragment in pGL4 including amp gene (pGL4B-NN3) 

**SpeI-NcoI-Ver2 (replacement for Spel-Ncol fragment in pGL4B-NN3 
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Table 17 

GENOMATEX MATRIX FAMILY LIBRARY INFORMATION 
Versions 2.4 to 4.1 

5 A;,flfa^ 

Matrix Family Library Version 2.4 (May 2002) contains 412 weight matrices in 
193 families 

(Vertebrates: 275 matrices in 106 families) 
Vertebrates 











I 

VSAHRR 


AHR-arnt heterodimers 
and AHR-related 
factors 


— , 

i 

VS AHRARNT.0 1 i 


aryl hydrocarbon receptor! 
/ Arnt heterodimers 


VSAHR01 


aryl hydrocarbon / dioxin 
receptor 






VSAHRARNT.02 | 


arvl Vivdrocarhon / Arnt 
heterodimers, fixed core , 


l 

: 

: 
i 

VSAP1F i 


i 

API and related factors; 

i 
i 

j 
i 

i 
I 

..... j 


VSAP1.01 ' 


API binding site i 


VSAP1.02 


activator protein 1 


VSAP1.03 * i 


activator protein 1 ; 


VSAP1FJ.01 ! 


— — — i 

activator protein 1 


VSNFE2.01 1 


t 

NF-E2 p45 1 


1 

• ■ i 

! 1 

; i 




: ^— - : ( 

VSVMAF.Ol | 


v-Maf 


i 

VSTCF11MAFG.01 ! 

j 


TCFll/MafG 
heterodimers, binding to 
subclass of API sites 


.... 

VSBEL1.01 


Bel-1 similar region 


VSAP2F 1 


Activator Protein 2 

i 


VSAP2.01 i 


1 

activator protein 2 




. _ . 


VSAP4R i 


AP4 and Related 
proteins 


VSAP4.01 


activator protein 4 


l 

* t 

! 

j 
I 




VSAP4.02 


■ 

activator protein 4 


VSTH1E47.01 


Thingl/E47 heterodimer, 
TH1 bHLH member 
specific expression in a 
variety of embryonic 
tissues 
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VST AL 1 ALPHAE47 .0 1 


Tal-lalpha/E47 
lieterodimer 


VSTAL 1 BET AE47 . 0 1 j 


iai-ioeia/x>f / 
heterodimer j 


i 

VSTAL1BETAITF2.01 1 


Tal-lbeta/ITF-2 
heterodimer 


VSAP4.03 I 


activator protein 4 


VSAREB 


Atplal regulatory 

element binding ! 

i 

i 
i 




j 

VSAREB6.04 


AREB6 (Atplal 
regulatory element 
binding factor 6) 


VSAREB6.02 


AREB6 (Atplal 
regulatory element 
binding factor 6) 




j 

VSAREB6.03 


AREB6 (Atplal J 
regulatory element 
binding factor 6) 


VSAREB6.01 > 

- - -- 


AREB6 (Atplal ; 
regulatory element 
binding factor 6) 


| 
i 

VSARP1 ; 


Apolipoprotein al and 
cm gene Repressor 
Protein 


VSARP1.01 ' 


apolipoprotein AI 
regulatory protein 1 


i 


i 


VSBARB ! 


BARbiturate-Inducible j 
EL box from ; 

Pro+eukaryot. genes ! 

j 


t 

VSBARBIE.01 1 


barbiturate-inducible 
element 


i 

I i 


! 


i 

! 
1 

; 

VSBCL6 i 


i 

i 
i 

i 

i 

POZ domain zinc 
finger expressed in B- 
Cells 


i 

i 

VSBCL6.01 

1 
1 
i 

. • '-^ — — 


POZ/zinc finger protein, 
transcriptional repressor, 
translocations observed 
in diffuse large cell 
lymphoma 


i 


1 

i 

VSBCL6.02 1 




POZ/zinc finger protein, 
transcriptional repressor, 
translocations observed 
in diffuse large cell 
lymphoma 


i 

VSBRAC 1 


Brachyury gene, 
mesoderm 


1 
1 

1 

VSTBX5.01 1 

....... — , . - - < 


T-Box factor 5 site 



related to Holt-Oram 
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developmental factor 




syndrome 


V JrJKAUii.U I 


cracnyury 


VSBRNF 


Brn POU domain 
factors 


VSBRN3.01 s 

i 


POU transcription factor , 




VSBRN2.01 


POU factor Brn-2 (N-Oct 
3) 


VSCABL 


C-abl DNA binding 
sites 


- ■ i 

i 

VSCABL.01 


Multifunctional c-Abl src , 
type tyrosine kinase j 






VSCART 


Cart-1 (cartilage 
homeoprotein 1) 


VSXVENT2.01 


Xenopus homeodomain 
factor Xvent-2; early 
BMP signaling response 




VSCART1.01 ! 

I 


Cart-1 (cartilage 
homeoprotein 1) 


i 

VSCDXF 


Vertebrate caudal j 
related homeodomam 
protein 


j 

! 

VSCDX2.01 1 


■ ■ i 

Cdx-2 mammalian caudal! 
related intestinal transcr. 1 
factor 






VSCEBP i 


Ccaat/Enhancer | 
Binding Protein 

1 


VSCEBPB.01 I 

i 


AA 1 /enhancer 
binding protein beta ; 




I 

VSCEBP.02 j 


C/EBP binding site 


VSCHOP 


CHOP binding protein 


VSCHOP.01 i 


heterodimers of CHOP 1 
and wiirsr aipna 




i 

. . . J 


! 
i 

1 

i ! 


i 
t 

i 

i 

CLOX and CLOX 

nuniuiugy \y*LJr j j 
factors 

! 

A 


i 

VSCDPCR3HD.0 1 J 

_ 1 


cut-like homeodomain 
protein 


■ j 

i 

VSCDP.01 | 
! 


cut-like homeodomain 
protein 


i 

VSCDP.02 : 


transcriptional repressor j 
CDP 


i < 

i 

i : 


\ 


t 

VSCDPCR3.01 1 


cut-like homeodomain 
protein 


VSCLOX.01 


Clox 


VSCMYB 


C-MYB, cellular 

transcriptional 

activator 


VSCMYB.01 : 


c-Myb, important in 
hematopoesis, cellular 
equivalent to avian 
myoblastosis virus \ 
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oncogene v-myb 


VSCOIVCP 


factors which 
COoperate with 
Myogenic Proteins 


VSCOMP1.01 


COMP1, cooperates with 
myogenic proteins in 
multicomponent complex, 




j 


VSCOUP 


Repr. of RXR- 
mediated activ. & 
retinoic acid responses 


VSCOUP.01 


COUP antagonizes HNF-i 
4 by binding site 
competition or synergizes 
by direct protein - protein 

interaction with HNF-4 i 

i 




f 

I 


VSCP2F 


CP2-erythrocyte Factor 
related to drosophila 
Elfl 


VSCP2.01 


CP2 






VSCREB 


Camp-Responsive 
Element Binding 
proteins 

i 

! 

1 

! 
i 

t 

! 

i 

i 

i 

! 

i 

i 

i 


VSCREBP1.01 


cAMP-responsive 
element binding protein 1 


t 

: j 

j j 

! 

j ] 

i 

! ! 

i I 

i 

i 

, 

i 




VSCREBP1CJUN.01 j 


CRE-binding protein 1/c- 
Jun heterodimer i 

.i 


VSCREB.01 


cAMP-responsive 
element binding protein 


VSHLF.01 j 


hepatic leukemia factor 


— -— - — — 

VSE4BP4.01 S 

i 

f 
? 


E4BP4, bZIP domain, 
transcriptional repressor j 


VSCREB.02 1 




cAMP-responsive 
element binding protein 


VSCREB.03 


cAMP -response element- 
binding protein 


i 
i 

! 

VSCREB.04 i 


cAMP-response element 
binding protein 


VSCREBP1.02 


CRE-binding protein 1 

... . . ... j 


VSATF.02 i 


ATF binding site 




i 

VSATF.01 


activating transcription 
factor ' 


VSTAXCREB.01 


Tax/CREB complex 


VST AXCREB .02 




Tax/CREB complex 

_ 
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M 










VSVJUN.01 


v-Jun 


VSE2FF 


E2F-myc activator/cell 
cycle regulator 


VSE2F.02 

i 


E2F, involved in cell 
cycle regulation, interacts 
with Rb pl07 protein 


VSE2F.03 


E2F, involved in cell 
cycle regulation, interacts 
with Rb pi 07 protein 






1 

VSE2F.01 I 


E2F, involved in cell i 
cycle regulation, interacts) 
with Rb pi 07 protein 




papillioma virus E2 
i rdiib^ripiivjiidji 
activator 


VSE2.01 


Lji V DOVIIIC pdpilltJIIla 

virus regulator E2 




VSE2.02 : 


papilloma virus regulator . 
E2 j 


t 

VSEBOR 


i 

E-BOx Related factors 


VSDELTAEFl .01 j 


deltaEFl 

t 




VSXBPl.Ol 


X-box -binding protein 1 


VSEBOX 


E-BOX binding factors i 

i 

i 

i 
1 
t 

1 

i 

i 

I 

i 

1 

i 
i 


VSUSF.02 i 


upstream stimulating I 
factor j 


i 
j 

; t 

: i 
i ; 

! 

i : 
j 

i 
j 

j i 

! 

i 


_ I 


j 

VSUSF.03 j 

I 

_ . . . t 


upstream stimulating 

factor 1 

i 


— 1 

VSMYLMAX.OJ | 

_ _. t 


MiU-maa Dinairig sites 

i 


VSSREBP.03 


j 

Sterol regulatory element 
binding protein 


I 

VSSREBP.02 


Sterol regulatory element 
binding protein 


j 

V j>M 1 1/MAA.UZ j 


c-iviyc/iviax neicrooimer 


VSNMYC.01 j 


N-Myc 


VSATF6.01 




Member of b-zip family, 
induced by ER ; 
damage/stress ; 


VSUSF.01 


upstream stimulating 
factor 
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VSMYCMAX.01 


c-Myc/Max heterodimer 


VSMAX.01 


Max 


VSARNT.01 ! 


AhR nuclear translocator j 

tiomodimers ' 

• 


VSSREBP.01 


Sterol regulatory element 

binding protein 1 and 2 

. . . j 


i 

VSECAT 


Enhancer-CcAaT 
binding factors 


VSNFY.02 


nuclear factor Y(Y-box ! 
binding factor) J 


VSNFY.03 


nuclear factor Y (Y -box 
binding factor) 






VSNFY.01 

. J 


nuclear factor Y (Y -box 

binding factor) ; 

, i 


i 
• 

1 
i 

VSEGRF i 

i 

i 


i 

EGR/nerve growth \ 
Factor Induced protein ' 
C & rel. fact. | 

i 


I 

i 

VSEGR1.01 | 


Egr-l/Krox-24/NGFI-A 
immediate-early gene 
product ! 


i 

VSEGR2.01 ! 

i 

t . . J 


Egr-2/Krox-20 early 
growth response gene 
product 


i 

VSEGR3.01 j 


early growth response 
gene 3 product 


VSNGFIC.01 ! 


nerve growth factor- i 
induced protein C ! 


\ 

1 

vswti.oi ; 


Wilms Tumor 
Suppressor 


VSEKLF 


l 

Erythroid krueppel like | 
factor j 


VSEKLF.01 


Erythroid krueppel like 
factor (EKLF) 

■ 




s 


VSETSF ; 


— ** ■ ■- — --— ■ — i 

Human and murine i 
ETS1 Factors | 

i 
i 


VSCETS1P54.01 ; 


c-Ets-l(p54) 


* 


.. - j 


j 

VSNRF2.01 ! 


nuclear respiratory factor 
2 


VSGABP.01 i 

l 




GABPrGA binding 
protein 


1 

VSELK1.02 i 


Elk-1 

_. . . . ._ 
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Aft '»» ffffifo*^ 


Eai^pli^brknati^ 






VSFLI.01 


ETS family member FLI 


VSETS2.01 


c-Ets-2 binding site 


VSETS1.01 ,: 


c-Ets-1 binding site j 


VSELK1.01 


Elk-1 


VSPU1.01 j 


Pu.l (Pul20) Ets-like 
transcription factor ! 
identified in lymphoid B- j 
cells 1 


I 

VSEVI1 


i 

EVIl-myleoid 
transforming protein ; 

i 

i 

i 

! 


VSEVI1.06 


Ecotropic viral 
integration site 1 encoded 
factor 


VSEVI1.02 


Ecotropic viral 
integration site 1 encoded 
factor 

i 


VSEVI1.03 


Ecotropic viral 
integration site 1 encoded 
factor ; 

i 


i 

; 
t 

1 

f 


i 

VSEVI1.05 j 

i 


J 

Ecotropic viral j 
integration site 1 encoded 
factor 


V3>bVH.U4 

1 


Ecotropic viral 
integration site 1 encoded 
factor 


VSEVI1.01 j 

! 


Ecotropic viral 
integration site 1 encoded 
factor 


I 

VSFKHD \ 


Fork Head Domain 
factors ; 

i 
i 


VSHFH1.01 ! 


HNF-3/Fkh Homolog 1 


, . . .... t 

i 

J 
i 

'■ 
1 




• 

VSHFH2.01 • 

i 


HNF-3/Fkh Homolog 2 


i 

VSHFH3.01 


HNF-3/Fkh Homolog 3 
(= Freac-6) 


VSHFH8.01 


HNF-3/Fkh Homolog-8 ! 


VSXFD1.01 


Xenopus fork head 
domain factor 1 
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t 
1 








VSXFD2.01 


Xenopus fork head 
domain factor 2 


VSXFD3.01 

i 


Xenopus fork head 
domain factor 3 j 


t 

VSHNF3B.01 


Hepatocyte Nuclear 
Factor 3beta 


VSFREAC2.01 j 

i 


Fork head RElated 
ACtivator-2 ! 


l 

VSFREAC3.01 


Fork head RElated 
AChvator-3 


VSFREAC4.01 


Fork head RElated 
ACtivator-4 


VSFREAC7.01 j 

— = i 

j 


Fork head RElated 
ACtivator-7 j 


i 

i 

VSGATA 1 


i 
i 
t 

i 

! 
i 

GATA binding factors j 

i 

1 

• j 
i 

j 
i 

j 


VSLMO2COM.02 


i 

complex of Lmo2 bound 
to Tal-1,E2A proteins, 
anu vj a i j\- i , nan-sue l , 


A7tf»^» ATA 1 (\A \ 

V3>CjA1A1.U4 j 

. .... . ( 


vja i A-Dinoing iacior 1 \ 


UtfPATA 1 AC 1 

VIKjAIAI.Ia) ; 

i 


i 

LjA i A-Dinamg iacior i 


VlbLrAl AZ.U1 t 


vja i A-Dinaing iacior l 


VSGATA2.02 


GATA-binding factor 2 




GATA-binding factor 3 


1 

] 
1 

i 

i 

! 


VSGATA3.01 j 


V$GATA3.02 i 


GATA-binding factor 3 


t 

i 

VSGATA.01 * 

i 


GATA binding site 


VSGATA1 03 


GATA-binding factor 1 


VSGATA1.01 t 


GATA-binding factor 1 


VSGATA 1.02 1 


GATA-binding factor 1 


! 

VSGFIl ! 


Growth Factor 
Independence- 
transcriptional 


VSGFI1.01 


growth factor 
independence 1 zinc 
finger protein acts as } 


j 
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— — 


^^^^^ 




repressor 




transcriptional repressor 


V$GKLF 


Gut-enriched Krueppel 
Like binding Factor 


VSGKLF.Ol 


gut-enriched Krueppel- 
like factor 






VSGREF 


Glucocorticoid 
responsive and related 
elements 


■ i 

VSGRE.Ol 1 


Glucocorticoid receptor, ! 
C2C2 zinc finger protein : 
binds glucocorticoid 
dependent to GREs 


VSARE.01 


Androgene receptor 
binding site j 


VSPRE.01 


Progesterone receptor 
binding site 


VSHAML 


Human Acute 
Myelogenous 
Leukemia factors 


VSAML1.01 


runt-factor AML- 1 




...... > 


i 

VSHEAT : 


HEAT shock factors \ 


VSHSF1.01 ! 


heat shock factor 1 J 


i 




VSHENl 


E-box binding factor 
without transcript, 
activation 


VSHEN1.01 


HEN1 


i 


VSHEN1.02 \ 


HEN1 

. . . t 


I 

VSHMTB j 


i 

Human muscle-specificj 
Mt binding site 


i 

VSMTBF.01 


muscle-specific Mt ! 
binding site | 


. i 




VSHNFl 


Hepatic Nuclear Factor ' 
l 


VSHNFl.Ol i 


hepatic nuclear factor 1 




VSHNFl. 02 i 


Hepatic nuclear factor 1 


I 

VSHNF4 ! 


Hepatic Nuclear Factor 
4 


VSHNF4.01 


Hepatic nuclear factor 4 


[ ! 


■■• - ■" i 

VSHNF4.02 


Hepatic nuclear factor 4 


| 

,V$HOMS 


j 

Homeodomain ! 
subfamily S8 I 


VSS8.01 


Binding site for S8 type 
homeodomains 






i ! 

' I 

! ! 
iVSHOXF 


i 

t 

! 

Factors with moderate 
activity to homeo 
domain consensus 
sequence : 

i 

. . ._. . . . j 


VSHOXA9.01 

_ _ J 


Member of the vertebrate 
HOX - cluster of 
homeobox factors 




VSHOX1-3.01 


Hox- 1 .3 , vertebrate 
homeobox protein 


VSIKRS 


Ikaros zinc finger 
family 


1 

I 

VS3LKE1...01- J 


iLvT?-i nVor/>c n 



enriched in B and T 
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lymphocytes 


VSIK2.01 

i 


Dcaros 2, potential 
regulator of lymphocyte 
differentiation . 


VSDC1.01 


Dcaros 1, potential ' 
regulator of lymphocyte 
differentiation 


| 

i 

VSIK3.01 ! 


Ikaros 3, potential , 
regulator of lymphocyte ! 
differentiation ! 


VSIRFF 


Interferon Regulatory 
Factors 

! 

1 

i 


VSIRF1.01 


interferon regulatory 
factor 1 


VSIRF2.01 


interferon regulatory 
factor 2 


i 


i 

. i 


i 

V$ISRE.01 j 


interferon-stimulated j 
response element 1 


VSLEFF 


LEF1/TCF 

t 


VSLEFl.Ol i 


TCF/LEF-1, involved in ; 
the Wnt signal | 
transduction pathway 


i 
t 


i 
t 
i 


VSLTUP 


t 

Lentiviral Tata | 
UPstream element l 

i 


1 

VSTAACC.01 1 


T n«^'iM«nl TATA 

JLrenti viral iaia 
upstream element 




1 

. _ i 


VSMEF2 ' 


MEF2 -myocyte- 
specific enhancer- 
binding factor j 

j 
j 

i 

i 

j 


VSMEF2.05 ! 


MEF2 j 


t 
j 

i 
i 

! 

i 
1 

! 

i 
1 

i 

i 

j 




VSMEF2.01 \ 


■ ■ ■■■ — ~ — j 

myogenic enhancer i 
factor 2 1 


VSHMEF2.01 


myocyte enhancer factor 


VSMMEF2.01 


myocyte enhancer factor 


VSRSRFC4.01 1 


j 

related to serum response 
factor, C4 


VSRSRFC4.02 


factor, C4 


VSAMEF2.01 | 


myocyte enhancer factor i 


VSMEF2.02 


myogenic MADS factor 
MEF-2 
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- • 








myogenic MADS factor 
MEF-2 


VSMEF2.04 

...... 1 


myogenic MADS factor 
MEF-2 


VSMEF3 


MEF3 BINDING 
SITES 


VSMEF3.01 


MEF3 binding site, 
present in skeletal 
muscle-specific 
transcriptional enhancers 






i 

VSMEIS 


Homeodomain factor 
aberrantly expressed in 
myeloid leukemia 


i 
I 

VSMEIS1.01 


Homeobox protein 
MEIS1 binding site 






l 

VSMINI ! 


Muscle INItiator 

i 


V$MUSCLE INL01 


Muscle Initiator 
Sequence 


VSMUSCLE INI.02 j 


Muscle Initiator j 
Sequence J 




i 

....... . . ^. . f 


V$A/T7 TQPT P fMT Cil 


i 

Muscle Initiator 
Sequence 

: — - — [ 


VSMOKF! 


Mouse Krueppel like . 
factor j 


VSMOK2.01 1 


Ribonucleoprotein j 
associated zinc finger 
protein MOK-2 


1 




VSMTFl 


Metal induced 
transcription factor 


VSMTF-1.01 


Metal transcription factor 
1, MRE | 






i 
I 

t 

i 

\ 
\ 
j 

VSMYODj 


i 

i 

! 
i 

i 

MYOblast ; 
Determining factor j 

i 
i 

t 
i 


VSMYOD.02 ! 


— i 

myoblast detennuiing 
factor 


! 

VSMYF5.01 j 


j 

Myf5 myogenic bf3LH 
protein 


VSMYOD.01 ! 


myoblast determination 
gene product 


i 
1 


1 

VSLMO2COM.01 


complex of Lmo2 "bound 
to Tal-1, E2A proteins, 
and GATA-1, half^site 1 


VSE47.01 


i 

MyoD/E47 and 
MyoD/E12 dimers 


VSE47.02 1 


TAL1/E47 dimers 
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VSMYOF I 

- 


^ 


/SNF1.01 'r 


mclear factor 1 


VtYOgenic Factors 


i 

/SMYOGNF1.01 


nyogenin / nuclear factor 
I or related factors 




Kenopus MYT1 C2HC 


■1 

VSMYTl. 02 ' 
i 


VlyTl zinc finger i 
xanscription factor : 
nvolved in nrimarv 
neurogenesis 


VSMYTl . 


zinc finger protein 


VJMYT1.01 ! 


VfvTl zinc fincer 
transcription factor ; 
involved in primary 
neurogenesis 


VSMZFl 


Myeloid Zinc Finger 1 
factors 


VSMZFl. 01 


MZF1 






! VSNFAT ■ 


Nuclear Factor of 
Activated T-cells \ 


VSNFAT.01 i 


Nuclear factor of . 
activated T-cells 






' VSNFKB ! 


Nuclear Factor Kappa ; 
B/c-rel j 

i 


VSCREL.01 ' 


c-Rel i 


VSNFKAPPAB.01 


NF-kappaB 


VfSNFKAPP AB65 .0 1 j 


NF-kappaB (p65) j 


I 1 


V$NFKAPPAB50.01 j 


NF-kappaB (p50) 


VSNFKAPPAB.02 \ 

i 


NF-kappaB 


VSNFKAPPAB.03 i 


NF-kappaB j 


• 

| 

i 

| 


NKX - Homeodomain 
: sites 

1 
i 

i 

1 - - 


. 

VSNKX25.01 


homeo domain factor 
Nkx-2.5/Csx, tinman 

sites 


;V$NKX25.02 
i 

i 

1 „. — _ ~- 


homeo domain factor 
Nkx-2.5/Csx, tinman 
homolog low affinity 
sites 


!v$NKX31.01 

i 


prostate-specific 
\ homeodomain protein 
INKX3.1 


VSNOLF 


; Neuron-specific- 
1 OLFactory factor 


!v$OLF1.01 


j 

j olfactory neuron-specific 
j factor 

i . ~. — • - 
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VSNRSF 


Neuron-Restrictive 
Silencer Factor 


V a>rNIVLyr .\J L 


neuron-restrictive 
silencer factor 




VSNRSE.01 i 

I 

i 


neural-restrictive- 
silencer-element 


VSOAZF 


Olfactory associated 
zinc ringer protein 


VSROAZ.01 


Rat C2H2 Zn finger 
protein involved in 
olfactory neuronal 
differentiation 




i 

1 
1 


VSOCT1 


OCTamer binding 
protein 

i 


VSOCT1.02 I 


octamer-binding factor 1 : 


VSOCT1.06 


octamer-binding factor 1 


VSOCT.01 


Octamer binding site 
(OCT1/OCT2 consensus) 


Tift yTT* ■* /\ ^ * 

VSOCT1.05 ! 


octamer-binding factor 1 


: 




v$OCTl 04 ' 


octamer-binding factor 1 j 


VSOCT1.03 


octamer-binding factor 1 


VSOCT1.01 


octamer-binding factor 1 t 


! ! 

1 ! 


... . , 

OCT6 Binding j 
factors_astrocytes + ! 
glioblastoma cells 


V J) 1 O 1 l.ul 


POU-factor Tst-l/Oct-6 j 




} 


VSOCTP 


OCT1 binding factor 
(POU-specific domain)- 


i 

VSOCT1P.01 ! 


octamer-binding factor 1, 
POU-specific domain 




t 


t : 

i i 

VSP53F 


p53 tumor suppr.-neg. ! 
regulat. of the tumor 
suppr. Rb 


! 

VSP53.01 


tumor suppressor p53 


i . . . 


... ~ 


■ i 

i 

iVSPAXl 


i 
i 

PAX-l binding site 


VSPAX1.01 j 


Paxl paired domain 
protein, expressed in the 
developing vertebral 
column of mouse 
embryos 

- 


! 

i 


! 

.. . . . j 


r 
- 

VSPAX3 


PAX-3 binding sites 

— 


! 

i 

VSPAX3.01 


Pax-3 paired domain 
protein, expressed in ; 
embryogenesis, 
mutations correlate to 

Waardenburg Syndrome 

„ _ j 


1 

! 

■ . __l 


i 

i 
J 
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VSPAX4 ] 


Heterogeneous PAX-4 , 


VSPAX4.01 1 


f oX-4 pdllCU UUIIldJIl 

jrotein, together with 
? AX-6 involved in 
pancreatic development j 




binding sites 


J 
1 

_ 


V$FAX5 


P AX-5 / PAX-9 B- 
cell-specific activating 
protein 

t 


— n 

t 

VSPAX9.01 


t 

zebrafish PAX9 binding 1 
sites 


VarAA3.Ul ; 


B-cell-specific activating 
protein i 






i 

VSPAX5.02 ] 


i 

B-cell-specific activating | 
protein 


VSPAX6 


Activ. involved in Iris 
development in the 
iij.uuoc eye 


W<t*T> A m 

VIbrAAO.Ul 


Pax-6 paired domain 
protein 






-VSPAX8 : 


PAX-2/5/8 binding ; 

sites ; 

_ j 


VSPAX8.01 J 


i 

PAX 2/5/8 binding site 


! 

1 1 


.. . . 1 


i 

VSPBXF 


Homeo domain factor ' 
PBX-1 


i 

VSPBX1.01 


homeo domain factor 

l UA 1 . 






i 

1 ; 

; i 

IVSPCAT 1 


i 

Promoter-CcAaT j 
binding factors ; 


VSACAAT.01 

! 

. . _j 


Avian C-type LTR 
CCAAT box 


1 

VSCAAT.01 j 


cellular and viral 
CCAAT box 






1 

VSCLTR CAAT.01 1 


Mammalian C-type LTR 
CCAAT box 


; i 

! 1 

iv$PDXl 


j 

Pancreatic and i 
intestinal 

Vi r»rnw*Hrvm5*iTi trmi<!(T 

factor 


VSPDX1.01 

j 

1 


Pdxl (IDX1/IPF1) 
pancreatic and intestinal 1 
homeodomain TF 


1 ; 

; i 

i ! 

i 


I 

VSISLL01 

_ _ 


Pancreatic and intestinal 
lim-homeodomain factor 


<V$PERO 


iriiKUxisome 

proliferator-activated 

receptor 


VSPPARA.01 


PPAR/RXR heterodimers 


! 

i 




* 

iVSPITl 


GHF-1 pituitary 
specific pou domain 
transcription factor 


VSPITl.Ol 


Pitl, GHF-1 pituitary ! 
specific pou domain 
transcription factor 


i 
i 




i 

1 VSRARF 


Nuclear receptor for 


A T> A1 ._. ... 
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: 


retenoic acid 




member of nuclear 
receptors 


va>ivii\.*ui 

t 

1 


Retinoid receptor-related 
testis-associated receptor : 
(GCNF/RTR) ! 


VCDBfT 


Regulator of B-Cell 
IgH transcription 


V irSKlvoril 1 .U I 


Bright, B cell regulator 
of IgH transcription 






VSRBPF 


RBPJ - kappa : 


VSRBPJK.01 ! 


Mammalian | 
transcriptional repressor j 
RBP-Jkappa/CBFl 






VSREBV 


Epstein-Barr virus 
transcription factor R 


VSEBVT*. 01 


Epstein-Barr virus 
transcription factor R 






VSRORA ! 


Estrogen receptor and ] 
rar-Rel. Orphan \ 
Receptor Alpha 


VSRORA1.01 , 


RAR-related orphan 
receptor alphal 


■■■■■■■ 1 

V$ROT^A2 01 


RAR-related orphan j 
receptor alpha2 






VSER.01 


estrogen receptor 


VSRREB I 


Ras-REsponsive j 
element Binding j 
protein j 


VSRREB 1.01 i 


Ras-responsive element ; 
binding protein 1 


f 


i 


. j 

VSRXRF ! 


RXR heterodimer j 
binding sites j 

t 

i 
1 
! 


i 

! 

i 

VSFXRJB.01 


Farnesoid X - activated 
receptor (RXR/FXR 
dimer) 


j 

VSVDR RXR.01 ! 

j 


VDR/RXR Vitamin D 
receptor RXR 
heterodimer site 


t 

! 

j 
| 


VSVDR RXR.02 ! 

| 


VDR/RXR Vitamin D 
receptor RXR 
heterodimer site 


VSLXRE.01 


Nuclear receptor 
involved in the regulation 
lipid homeostasis 


• 

i 

VSSATB ! 


r 

Special AT-rich 
sequence binding j 
protein j 


j 
i 

VSSATBl.Ol 


Special AT-rich 
sequence-binding protein 
^predominantly 
expressed in thymocytes, 
binds to matrix 


i 


i 

\ 
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attachment regions 
(MARs) 


VSSEFl 


SEF1 protein in mouse 
Retrovirus SL3-3 


VSSEF1.01 


SEF1 binding site . 




f 


VSSF1F 


Vertebrate 
steroidogenic factor 


i 

VSSFL01 : 


! 

SF1 steroidogenic factor ■ 
1 






VSSMAD 


Vertebrate SMAD 
family of transcription 
factors 


VSSMAD3.01 i 


Smad3 transcription 
factor involved in TGF- 
beta signaling ; 


! 

VSSMAD4.01 


Smad4 transcription 
factor involved in TGF- 
beta signaling 


i 




VSFAST1.01 

i 


FAST-1 SMAD 
interacting protein , 


i 

i 

i 

VSSORY { 

i 


i 

i 

t 

SOx/sRY-sex/testis ; 
determinig and related j 
HMG Box factors | 

i 


i 

VSSOX5.01 \ 

j 


i 

Sox-5 ; 


t 

VSSRY.01 ■ 

i 


sex-determining region Y! 
gene product 


VSHMGIY.01 

| 


! 

HMGI(Y) high-mobility- 
group protein I (Y), 
architectural transcription 
factor organizing the 
framework of a nuclear t 
protein-DNA 
transcriptional complex 


i 

VSSOX9.01 ! 


SOX (SRY-related HMG 
box) 


j 

VSSP1F ! 

f 

i 
\ 

i 


i 
1 

GC-Box j 
factors_SPl/GC j 

| 


- ■ - — -j 

i 

vhspi oi i 

V 4) OX 1 . V/ 1 | 

i 


stimulating protein 1 
SP1, ubiquitous zinc 
finger transcription factor 


t 

VSGC.Ol 


GC box elements 


f 
| 

i 

VSSRFF ' 


Serum Response 
element binding Factor 


VSSRF.02 

_ 


serum response factor 


v$srf.o3 ; 


serum responsive factor 






VSSRF.01 ! 


serum response factor 
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VSSTAT 


Signal Transducer and 
Activator of Transcript 
factors 


VSSTAT.01 


signal transducers and 
activators of transcription 


VSSTAT5.01 1 

i 


STATS: signal 
transducer and activator I 
of transcription 5 I 


VSSTAT6.01 


STAT6: signal 
transducer and activator 
of transcription 6 




j 


1 

VSSTAT1.01 


signal transducer and 
activator of transcription 
1 


VSSTAT3.01 


signal transducer and 
activator of transcription 
3 


VST3RH 


Viral homolog of ; 
thyroid hormon ! 
receptor alpha 1 (AEV 
vErbA) 


i 
j 

VST3R.01 ! 


i 

vErbAj viral homolog of 
thyroid hormone receptor 
alpha 1 






! 

; 

! 

VSTBPF : 


I 
i 

» 

Tata-Binding Protein 
Factor 

i 
1 

! 


VSTATA.02 i 
! 


Mammalian C-type LTR » 
TATA box 


VSATATA.01 


Avian C-type LTR 
TATA box 


! 

: 


VSTATA.01 1 

. ! 


cellular and viral TATA 
box elements j 


VSMTATA.Ol i 

i 


Muscle TATA box 


[— : -i 

1 

VSTCFF 


TCF11 transcription j 
Factor j 


VSTCFll.Ol 


TCFll/KCR-Fl/Nrfl 

homodimers 

_ . J 







VSTEAF j 


TEA/ATTS DNA j 
binding domain factors j 


i 

] 

VSTEFl.Ol I 

' — ' i 

j 


TEF-1 related muscle 
factor 


i 
i 
i 

VSTTFF | 


i 

Thyroid transcription j 
factor- 1 J 


I 

i 

VSTTFl.Ol 


Thyroid transcription 
factor-1 (TTF1) binding 
site 




I 

I 


iVSVBPF ; 


chicken Vitellogenin 
gene Binding Protein 
factor 


i 

VSVBP.Ol 


PAR-type chicken 
vitellogenin promoter- 
binding protein 

., ... . _ . , . 


i 





138 



WO 2006/034061 



PCT/US2005/033218 







■HH3I 




VSVMYB 


AMV-viral myb 
oncogene 


VSVMYB.02 


v-Myb 




VSVMYB.01 


v-Myb 


VSWHZF 


Winged Helix and ZF5 ' 
binding sites 


VSWHN.01 


winged helix protein, i 
involved in hair ! 
keratinization and ! 
thymus epithelium 
differentiation 






VSXBBF 


X-box binding Factors 


— j 

VSRFX1.01 j 


X-box binding protein 
RFX1 ; 


VSRFX1.02 


X-box binding protein 

T>T7V1 

KrAl 






VSMIF1.01 


MIBP-1 /RFX1 complex 


t 

VSXSEC • 


i 

Xenopus SEleno j 
Cystein t-RNA 
activiating factor 


VSSTAF.02 | 

j 


Se-Cys tRNA gene j 
transcription activating 1 
factor 




VSSTAF.01 \ 

; 


Se-Cys tRNA gene 
transcription activating 
factor i 


i 

VSYY1F ! 


! 

activator/repressor 
binding to transcr. init. 
site ; 


VSYY1.01 


Yin and Yang 1 






VSZBPF 


Zinc binding protein 
factor 


— , 

VSZBP89.01 i 


Zinc finger transcription 
factor ZBP-89 

— — — — ■ — - — ■ ■ — — 






VSZFIA 


ZincFinger with J 
InterAction domain j 
factors | 


VSZID.01 


zinc finger with 
interaction domain 
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Matrix Family Library Version 3.0 (Nov 2002) contains 452 weight matrices in 
216 families 

(Vertebrates: 314 matrices in 128 families) 
5 New weight matrices - Vertebrates 





^Jlfafp^mation^l 




mmmmmmimimm 


VSAP1F 


API and related 
factors 


VSBACH1.01 


WTO/POT-HTTP 

15 X Dl ± \jL-i- UZ-tLT 

transcription factor 
BACH1 forms < 
heterodimers with the t 
small Maf protein family ; 


t 
i 






CAS interating zinc 
finger protei 


Vj>iNiVLr4.Ul 


NMP4 (nuclear matrix 
protein 4) / CIZ (Cas- 
interacting zinc finger 
protein) 






VSCREB i 


Camp-Responsive 
Element Binding j 
proteins 


V$ATF6.02 


Activating transcription 
factor 6, member of b-zip ; 
family, induced by ER 
stress ! 






VSE4FF 


Ubiquitous GLI - 
Krueppel like zinc 
finger involved in 
cell cycle regulation j 


VSE4F.01 


GLI-Krueppel-related 
transcription factor, 
regulator of adenovirus 
E4 promoter 




i 

. . ... i 


t 

i 

V$GFI1 • 


: , 

Growth Factor j 
Independence- i 
transcriptional 
repressor ; 


VSGfllB.Ol 


j 

Growth factor 
independence 1 zmc 
finger protein Gfi-IB 






VSGLIF 


GLI zinc finger \ 
family ; 


VSGLIl.Ol 


Zinc finger transcription 
factor GLH 






i 
! 

VSHAML 


! 

Human Acute j 
Myelogenous j 
Leukemia factors j 


VSAML3.01 


Runt-related transcription 
factor 2 / CBFA1 (core- 
binding factor, runt 
domain, alpha subunit 1) 






VSHESF 


— 'i 

Vertebrate < 

homologues of ' 

enhancer of split j 

complex ; 


i 

VSHESl.Ol 


Drosophila hairy and 
enhancer of split 
homologue 1 (HES-1) 


1 




1 

VSHIFF 

l 

. .„ ! 


Hypoxia inducible 
factor, bHLH/ PAS j 
protein family » 

i 

..... .-j 


VSHIFl.01 j 


Hypoxia induced factor- 1 
(fflF-1) 


i 

VSHIFl.02 


Hypoxia inducible factor, • 
bHLH /PAS protein 
family 
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^^^^^^^ 




VSHNF6 


Onecut 

Homeodomain 
factor HNF6 


V$HNF6.01 


Liver enriched Cut - 
Homeodomain 
transcription factor HNF6 
(ONECUT) ; 




Factors with 
moderate activity to 
homeo domain . 
consensus sequence ; 


VSCRX 01 

Y lift L/XV/ V * \J X. 


Cone-rod homeobox- \ 
containing transcription , 
factor / otx-like 
homeobox gene 


VSHOXF 


VSEN1.01 1 


Homeobox protein f 
engrailed (en- 1) ; 


VSPTX1.01 


Pituitary Homeobox 1 
(Ptxl) 


VSIRFF 


Interferon 
Regulatory Factors 


VSIRF3.01 


Interferon regulatory 
factor 3 (IRF-3) 




VKIRF7 01 


Interferon regulatory 
factor 7 (IRF-7) j 


— =1 

! 

VSMAZF 


! 

Myc associated, zinc ; 
lingers 


VSMAZ.01 ! 


Myc associated zinc 
finger protein (MAZ) | 


VSMAZR.01 


MYC-associated zinc 
finger protein related 
transcription factor ■ 


VSMEIS 


Homeodomain 
factor aberrantly j 
expressed in ; 
myeloid leukemia i 


VSMEIS 1.01 


Binding site for j 
monomeric Meisl 
homeodomain protein 






VSMITF 


Microphthalmia 
transcription factor i 


VSMTT.Ol 


MIT (microphthalmia j 
transcription factor) and ; 
TFE3 ! 




». . .. I 


VSMOKF 


j 

like factor 


i 

VSMOK2.02 


Ribonucleoprotein . j 
associated zinc finger j 

protein MOK-2 (human) J 

. 







i VSNEUR 


NeuroD, Beta2, 1 
HLH domain 

: 


VSNEURODL01 


DNA binding site for ! 
NEUROD 1 (BETA-2/ 
E47dimer) ; j 


j i 




;V$NF1F i 


Nuclear Factor 1 


VSNF1.02 


Nuclear factor 1 (CTF1) 


IVSNKXH 


NKX/DLX- 
Homeodomaira sites 


VSDLX1.01 


DLX-1, -2, and -5 binding 
sites 


• 

i 




VSDLX3.01 


Distal-less 3 
homeodomain 
transcription facto 


VSHMX3.01 


i 

H6 homeodomain 
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transcription factor 


VSMSX.01 


Homeodomain proteins 
MSX-1 andMSX-2 


VSMSX2.01 


Muscle segment homeo 
box 2, homologue of 
Drosophila (HOX 8) 


VSNRLF 


Neural retina 
leucine zipper 


VSNRL.01 


Neural retinal basic 
leucine zipper factor 

(bzn>) 




. ; 


VSPARF 


PAR/bZIP family 


VSDBP.01 


Albumin D-box binding j 
protein 






VSPBXC 


PBX1 -MEIS1 
complexes 

! 

i 


VSPBXl MEIS1.01 


Binding site for a 
Pbxl/Meisl heterodimer 


VSPBXl MEIS1.02 


Binding site for a 
Pbxl/Meisl heterodimer 


i 
\ 




VSPBXl MEIS1.03i 


Binding site for a i 
Pbx 1 /Meis 1 heterodimer j 


t 

VSPLZF 


"™i 

C2H2 zinc finger 
protein PLZF 


i 

VSPLZF.01 


Promyelocyte leukemia 5 
zink finger (TF with nine ' 
Krueppel-like zink 




i 


VSPXRF ' 


i 

i 

Pregnane X receptor! 


VSPXRCAR.01 ! 


HalfsiteofPXR j 
(pregnane X 

receptor)/RXR resp. CAR 

I /T"\fi ofi "Til tn/^ 1 onHrrtctanp ! 
^LiUIlollLULlVC dllUTUoldllC 

receptor)/RXR 
heterodimer binding site 




_ j 


i 

VSRORAj 


v-ERB and rar- 1 

j 

related Orphan 
Receptor Alpha j 


VSNBRE.01 


Monomers of the nur 
snhfamilv of nuclear 
receptors (nur77, nurrl, 
nor-1) 


j 


1 ..j.— .- ■ 


j 

VSSF1F ! 


Vertebrate 
steroidogenic factor j 


j 

VSFTROl 


A 1 Tib fi (\ ^-fpfnnrnt pi n 

transcription factor (FTF), 
liver receptor homologue- 
1 (LHR-1) 


1 
i 
t 


-■■ - ■- 


i 

VSSIXF ! 


Sine oculis fSDO 1 
homeodomain ! 
factors 


VSSIX3.01 ; 


SDG / SDCdomain ( SB) 
and Homeodomain (HD) 
transcription factor 




i 


i 

VSTALE ! 


TALE | 
Homeodomain class 
recognizing TG 
motives 


VSTGIF.01 


TG-interacting factor 
belonging to TALE class 
of homeodomain factors 


I 





142 



WO 2006/034061 PCTAJS2005/033218 









V3ZF5F 


zinc finger | 


Zinc finger / POZ domain 
transcription factor 



Weight matrices renamed 

• V$MEIS1 .01 renamed to VSMEISl HOXA9.01 



Weight matrices moved to other families 

• V$BEL1.01 moved from V$AP1F to VSBELl 

• V$NF1 .01 moved from V$MYOF to VSMFl 

• V$ER.01 moved from VSRORA to VSEREF 

• VST3R.0 1 moved from VST3RH to VSRORA 

• V$CLTR_CAAT.0 1 moved from VSPCAT to VSRCAT 

• VSFAST1.01 moved from VSSMAD to VSFAST 
Weight matrices removed 

• V$MUSCLE_INI.03 



^@h1m£elgfom>&a^^ 

Matrix Family Library Version 3.1 contains 456 weight matrices in 216 families 
1 5 (Vertebrates: 318 matrices in 128 families) 



New weight matrices - Vertebrates 











i 

VSLEFF 


— i 
j 

LEF1/TCF j 

i 


VSLEF1.02 j 


TCF/LEF- 1 , involved in the j 
Wnt signal transduction j 
pathway j 


i 


... 1 


V$PAX2| 


i 

PAX-2 binding sites ! 


VSPAX2.01 | 


Zebrafish PAX2 paired j 
domain protein \ 






V$PAX5| 


PAX-5/PAX-9 B- | 
cell-specific j 
activating protein i 


VSPAX5.03 


i 

PAX5 paired domain protein j 

._ j 




_J 


V$PAX6i 

i 


PAX-4/PAX-6 paired! 
domain binding sites i 

i 
1 

... ....... 1 


1 

VSPAX4 PD.01! 


PAX4 paired domain binding ! 
site j 


1 

VSPAX6.02 


PAX6 paired domain and j 
homeodomain are required j 
for binding to this site _ j 


VSZBPFI 


Zinc binding protein 


VSZF9.01 


Core promoter-binding ; 
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factor 


protein (CPBP) with 3 






Krueppel-type zinc fingers 



Weight matrices modified 

. VSAMLl.Ol 

. VSAML3.01 

5 Weight matrices moved to other families 

• VSARNT.Ol moved from V$EBOX to VSHIFF (ARNT is a synonym for 
HIF1B) 

Weight matrices removed 

. VSSEF1.01 

10 • V$OCT1.03 

Version 3.1.1 (April 2003) 

Matrices VSIRF3.01 and VSIRF7.01 corrected. 
Version 3.1.2 (June 2003^ 

Matrix VSGfHB.Ol corrected. 
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Matrix Family Library Version 3.3 (August 2003) contains 485 weight matrices 
in 233 families 

(Vertebrates: 326 matrices in 130 families) 
5 New weight matrices - Vertebrates 



mi 








V3EREF 


Estrogen Response 
Elements 


VSER.02 


i^dnoxucai pdJiiiuruiiiic 
estrogen response 
element (ERE) , 






f 

VSSP1F 


GC-Box 
factors_SPl/GC 


VSBTEB3.01 


Basic transcription ! 
element (BTE) binding ! 
protein, BTEB3, FKLF- 






VSCDEFi 


Cell cycle 
regulators: Cell 
cycle dependent j 
element j 

i 


VSCDE.01 


Cell cycle-dependent 
element, CDF-1 binding 
site (CDE/CHR tandem ; 
elements regulate cell j 
cycie uepenoeni i 
repression) i 


j 


i 

j 
1 
i 


VSCHRF; 


Cell cycle 
regulators: Cell 
cycle homology j 
element | 


VSCHR.01 


Cell cycle gene 
homology region 
(CDE/CHR tandem | 
elements regulate cell j 
cycle dependent 
repression) 


. i 




VSHIFF 


Hypoxia inducible 
factor, bHLH/ 
PAS protein 

family 

... . „J 


VSCLOCK BMAL1.01 


Binding site of | 
Clock/BMALl • 
heterodimer, 

NPAS2/BMAL1 | 

i 

heterodimer | 






VSFKHD 


Fork Head 
Domain factors 


VSFKHRL1.01 


Fkh-domain factor i 
FKHRL1 (FOXO) 


_j 




■ — - 1 

j 
i 

VSP53F 1 


p53 tumor suppr.- 
neg. regulat. of the 
tumor suppr. Rb 


VSP53.02 


Tumor suppressor p53 
(5' half site) 


i 


VSP53.03 i 

1 


Tumor suppressor p53 
(3\hdfsite) 



Weight matrices modified 
. VSGFIl.Ol 
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Matrix Family Library Version 4.0 (November 2003) contains 535 weight 

matrices in 253 families 

(Vertebrates: 339 matrices in 136 families) 

5 New weight matrices - Vertebrates 









vsaarf' 


AARE binding factors 


V/<C A APF m 


Ammo acid response 1 
element, ATF4 binding site 






j 


MAF and API related 

factors i 

1 
1 


VSBACH2.01 


Dacnz Douna ikjd 


V 4>iNX , HZi-rZ.V/l - 


Nuclear factor (erythroid- 
derived2)4ike2,NRF2 ) 


VSCDXF 


Vertebrate caudal 
related homeodomain 
protein 


VSCDX1.01 


Intestine specific 
homeodomain factor CDX-1 






VSDEAF i 


Homolog to deformed 
epidermal 

autoregulatory factor- 1 ; 
from D. melanogaster ; 


VSNUDR.01 


NUDR (nuclear DEAF-1 
related transcriptional 
regulator protein j 




- • — 


f — =*= i 

VSETSF ! 


_ j 

Human and murine \ 
ETS1 factors 


VSELF2.01 '' 


tits - lamily memoer ni^r-z 
(NERFla) 






VSGABF 


GA-boxes 


VSGAGA.01 


GAGA-Box 


i VSBDNFl ' 


TTf»r>atic Tvluclear Factor i 

i „ ... .; 


VSHNF1.03 | 


Hepatic nuclear factor 1 


jl. ... . 1 




i j 
1 1 
VSHOXF 1 


Factors with moderate j 
activity to homeo j 
domain consensus 
sequence 


i 
I 

VSGSC.01 


Vertebrate bicoid-type 
homeodomain protein 

Goosecoid \ 

\ 


j 




,V$LHXF ! 


Lim homeodomain j 
factors 


VSLHX3.01 i 


Homeodomain bmdmg site j 
in LIM/Homeodomain factor j 
LHX3 ! 






VSNKXH 


NKX/DLX- 
homeodomain sites 


VSNKX32.01 


Homeodomain protein j 
NKX3,2(BAPX1,NKX3B, j 
Bagpipe homolog) 


| 




VSRJBPF 


RBPJ - kappa 


VSRBPJK.02 


Mammalian transcriptional \ 
repressor RBP-Jkappa/CBFlj 







1 

iv$RJP58 


r— - ' 

RP58 (ZFP238) zinc 
finger protein 


VSRP58.01 


Zinc finger protein KP5 8 
(ZNF238), associated 
preferentially with 
heterochromatin 







Weight matrices modified 



• VSGRE.01 
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• VSNFY.03 
Weight matrices moved to other families 

. VSBACH1 .01 moved from VSAP1F to V$AP1R 
. V$NFE2.01 moved from VSAP1F to VSAP1R 
5 . V$TCF1 1MAFG.01 moved from VSAP1F to VSAP1R 
. VSVMAF.01 moved from VSAP1F to VSAP1R 

10 Matrix Family Library Version 4. 1 (February 2004) contains 564 weight matrices 
in 262 families 

(Vertebrates: 356 matrices in 138 families) 



New weight matrices - Vertebrates 



WBBB 


mmmmm\ 


MM! 




VSBNCF j 


Basonuclein rDNA j 
transcription factor j 
(Poll) j 


VSBNC.01 i 


Basonuclin, cooperates 
with USF1 in rDNA Poll j 
transcription) j 


i 


I 


vscmyb: 


! 

C-myb, cellular 
transcriptional 
activator i 

j 


VSCMYB.02 


c-Myb, important in 1 
hematopoesis, cellular i 
equivalent to avian 
myoblastosis virus j 
oncogene v-myb j 


t 

1 

t 

'1 ... .. . .A 


i 


' i 

i 

: i 

VSCP2F j 


CP2-erythrocyte j 
Factor related to i 
drosophila Elfl 


VSCP2.02 


LBP-lc (leader-binding ! 
protein-lc) s LSF (late 
SV40 factor), CP2, SEF 
(S AA3 enhancer factor) ! 






VSEKLF ! 


Basic and erythroid | 
Kraeppel like factors j 


VSBKLF.01 i 


Basic krueppel-like factor ! 
(KLF3) j 


i . i 




i 

1 

IVSHAND 


— 1 

bHLH transcription 
factor dimer of S 
HAND2 andE12 


i 

i 

VSHAND2 E12.011 


Heterodimers of the 
bHLH transcription 
factors HAND2 (Thing2) 
andE12 




1 


IVSHIFF 1 


Hypoxia inducible ' 
factor, bHLH / PAS 
protein family 


VSDEC1.01 | 


Basic helix-loop-helix j 
protein known as Decl , j 


1 i 


, ,. i 


Stral3 orSharp2 j 

— — — ~ — — — — — i 


i ! 

:V$HNF6 i 


Onecut 

Homeodomain factor 
HNF6 


1 

■■ 

VSOC2.01 


CUT-homeodomain 
transcription factor i 
Onecut-2 i 




. . J 


i 
; 

■ 

;V$HOXF 


Factors with 
moderate activity to 
homeo domain 
consensus sequence 


V$QTX2.01 

■ — 


Homeodomain 
transcription factor Otx2 
(homolog of Drosophila 
orthodenticle) 





147 



WO 2006/034061 



PCT/US2005/033218 













VSGSH1 01 


Homeobox transcription 
factor Gsh-1 


VSIRFF . 


Interferon 

Regulatory Factors ; 


VSIRF4.01 


Interferon regulatory 
factor (IRF)-related 
protein (NF-EM5, PP, 
I^nilsICSM) j 




i 


X7CTT TTVT? 


Lim homeodomain 
factors 


VcbLMAlrJ.UI 


LDM-homeodomain 
transcription factor 






VSMYTl : 


MYT1 C2HC zinc ! 
finger protein ' 

i 


VSMYTl L.01 j 


Myelin transcription 
factor 1-like, neuronal 1 
C2HC zinc finger factor 1 i 




j 


VSNEUR 


NeuroD, Beta2 3 
HLH domain 


VSNEUROG.01 


Neurogenin 1 and 3 
(ngnl/3) binding sites 








AMV-viral myb 
oncogene 


VSVMYB.03 

1 


v-Myb, viral myb variant 
from transformed BM2 
cells 


vsvmyb; 

£ 
1 


VSVMYB.04 


v-Myb, AMV v-myb 


VSVMYB.05 


v-TVfvb variant of AMV 
v-rnyb 


VSZBPF \ 


Zinc binding protein ; 

factor i 

i 

£ 


i 

! 
i 

VSZNF202.01 j 


Transcriptional repressor, 
binds to elements found 
predominantly in genes 
that participate in lipid 
metabolism 


t 


j 



Weight matrices modified 



• VSCMYB.Ol 

• VSPTXl.Ol 

Copyright © Genomatix Software GmbH 1998-2004 - All rights reserved 

5 

Example 6 

Summary of Design for Particular Selectable Genes 
TF binding sites and search parameters 

Each TF binding site ("matrix") belongs to a matrix family that groups 
10 functionally similar matrices together, eliminating redundant matches by 
Matlnspector professional (the search program). Searches were limited to 
vertebrate TF binding sites. Searches were performed by matrix family, i.e., the 
results show only the best match from a family for each site. Matlnspector 
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default parameters were used for the core and matrix similarity values (core 
similarity = 0.75, matrix similarity = optimized). 



Table 18 
Gene Designations 



A. Sy nthetic hygromycin gene 



Sepien^el 



^'• : v'Eife^r,'/.'j 



from pcDNA3.1/Hygro 



Not 

applicable 



hhyg 



humanized ORF 



Not 

applicable 



hhyg-1 



First removal of undesired sequence matches 



Ver 3.1.2 Jun 
2003 



hhyg-2 



Second removal of undesired sequence 
matches 



Ver 3.1.2 Jun 
2003 



hhyg-3 



Third removal of undesired sequence 
matches 



Ver 3.1.2 Jun 
2003 



hHygro 



Changes to ORF and add linker 



Ver 3.3 Aug 
2003 



hhyg-4 



Fourth removal of undesired sequence 
matches 



Ver 3.3 Aug 
2003 



B. Sy nthetic neomycin gene 




neo 



from pCI-neo or psiSTRDCE neo 



Not 

applicable 



hneo 



humanized ORF 



Not 

applicable 



hneo-1 



First removal of undesired sequence matches 



Ver 3. 12 Jun 
2003 



hneo-2 



Second removal of undesired sequence 
matches 



Ver 3.1.2 Jun 
2003 



hneo-3 



Third removal of undesired sequence 
matches 



Ver 3.1.2 Jun 
2003 



hneo-4 



Changed 5' and 3' flanking regions/cloning 
sites 



Ver 4.1 Feb 
2004 



hneo-5 



Fourth removal of undesired sequence 
matches 



Ver 4.1 Feb 
2004 
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C. Sy nthetic puromycin gene 



vbequence^ 






puro 


rrom psio 1 xvjjsjd puxumyt/ni 


Nnt 

applicable 


hpuro 


humanized ORF 


Not 

applicable 


hpuro-1 


First removal of undesired sequence matches 


Ver4.1Feb 
2004 


hpuro-2 


Second removal of undesired sequence 
matches 


Ver4.1 Feb 
2004 



Note: the above sequence names designate the ORF only (except for Hhygro 
which includes flanking sequences). Addition of <f F" to the sequence name 
indicates the presence of up- and down-stream flanking sequences. Additional 
letters (e.g./TS") indicate changes were made only to the flanking regions 



Table 19 



Sequences in Synthetic Hygromycin Genes 

10 

TFBS in hhvg 

Before removal of TFBS from hhyg (94 matches) 




VSPCAT/CAAT.Ol j 


cellular and viral CC AAT box j 


VSMINl/MUSCLE INI.02 j 


Muscle Initiator Sequence 


VSMINI/MUSCLE INL01 ' 


Muscle Initiator Sequence 


V$ETSF/PU1.01 


Pu.l (Pul20) Ets-like transcription factor ; 
identified in lymphoid B-cells j 


VSAHRR/AHRARNT.02 j 


Aryl hydrocarbon / Arnt heterodimers, fixed j 
core j 




VSEGRF/EGR3.01 j 


early growth response gene 3 product j 


VSAP4R/AP4.01 • j 


Activator protein 4 


VSEGRF/NGFIC.01 


Nerve growth factor-induced protein C 


VSMAZF/MAZ.01 ! 


Myc associated zinc finger protein (MAZ) . 


VSZBPF/ZF9.01 ! 

i 


Core promoter-binding protein (CPBP) with 
3 Krueppel-type zinc fingers 


1 

VSCREB/ATF6.02 j 


Activating transcription factor 6, member of 
b-zip family, induced by ER stress 


VSEGRF/EGR3 . 0 1 : 


early growth response gene 3 product 


VSZBPF/ZF9.01 


Core promoter-binding protein (CPBP) with 
3 Krueppel-type zinc fingers 


VSHMVHFl .02 


Hypoxia inducible factor, bHLH / PAS 
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nrotein familv 


VSE2FF/E2F.01 


E2F involved in cell cvcle regulation, 
interacts with Rb pi 07 protein 


V$AP4R/AP4.01 


Activator protein 4 j 


VRHEN1/HEN1 02 


HEN1 i 


V < RMYOD/F47 01 


MvoD/E47 and MvoD/E12 dimers 


V^FGRF/FGRl 01 


f»nr1v OTowtVi re^nnnse t?ene 3 Droduct 

v(U 1 jr glUVYUi Ivopuiitfv gvuv ^/.l\suuisv j 


VSMOKF/MOK2.02 

1 


Ribonucleoprotein associated zinc finger 
protein MOK-2 (human) J 


V$SP1F/GC.01 


GC box elements , 


VSNRSF/NRSE.01 


Neural-restrictive-silencer-element 




RAR-related orphan receptor alpha2 


VSZBPF/ZF9.01 

• 


Core promoter-binding protein (CPBP) with • 
3 Krueppel-type zinc lingers ! 


; 

'VSZF5F/ZF5.01 


Zinc finger / POZ domain transcription 
factor 


VSAHRR/AHRARNT.02 


Aryl hydrocarbon / Arnt heterodimers, fixed ; 

; : 1 




iVSAPlF/TCFUMAFG.Ol ■ 


TPF1 1/MafG heterodimers hindinff to ; 
subclass of API sites j 




i VSEKLF/EKLF 01 


Erythroid krueppel like factor (EKLF) | 


IVSNRSF/NRSF.Ol ! 


Neuron-restrictive silencer factor 


: VSNRSF/NRSE.01 ' 


Neural-restrictive-silencer-element 


■ VSEBOX/MYCMAX.03 , 


MYC-MAX binding sites 


V<CP VPF/RYPF 01 

[ .! 


Faraesoid X - activated receptor (RXR/FXR 
dimer) 


i i 

IV<C ATTPP/ATTR APKFT 09 


Aryl hydrocarbon / Arnt heterodimers, fixed j 
core ! 


\ \ 


\ \ 
VSWHZF/WHN.Ol 1 

^ ^ ! 


Winged helix protein, involved in hair | 
keratinization and thymus epithelium 1 
differentiation 


VSEGRF/EGR1.01 

1 

l 


Egr-l/Krox-24/NGFI-A immediate-early 
gene product 


VSSMAD/SMAD3.01 : 


Smad3 transcription factor involved in TGF- 
beta signaling • 


VSMOKF/MOK2.01 

J 


Ribonucleoprotein associated zinc finger 
protein MOK-2 (mouse) 


VSMYOD/MY OD.02 [ 


Myoblast determining factor 
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BHHH 


IH8BMHBI 


VSE4FF/E4F.01 ' 


GLI-Krueppel-related transcription factor, ! 
eguiator oi aQenovirua err piumuici 


VSMOKP/MOK2.01 

■v 


Ribonucleoprotein associated zinc finger 

jrOlcin aVIVJJv-^ ^iuuuo^ . _J 


VSEGRF/EGR2.01 


bgr-z/Krox-/u eariy growm response gene 
product 


VSEGRF/EGR3.01 


early growth response gene 3 product , 


VSHIFF/HIF1.02 


Hypoxia lnouciDie iacior, oxxlcfl / r t\.o 
protein family _ | 


VSEBOX/USF.02 


Upstream stimulating factor 


VSHEFF/ARNT.0 1 


AhR nuclear translocator homodimers ! 


VHWiV/TVi 01 
v $£jr jr / j.ui 


Zinc finger / POZ domain transcription 
factor 


VSEBOZX/ATF6.01 


Member of b-zip family, induced by ER j 
damage/stress, binds to trie JiKbrt m 
association with NF-Y j 


VSBELl/BELl.Ol 


JBei-l similar region ^uennea in Xjenuviru& 
LTRs) 


T 7tf*XTT* PT7/KTD OT7 A1 

VSNKSF/NRSll.Ol 


iNeurai-resxricuve-siiencer-eieiiiciiL i 


: VSMYOD/MYOD.Ol 


JMyo blast determination gene proouci [ 


S ' j 
!V$NEUH/NEUROD1.0l i 


DNA binding site tor JNxiUKUUi [pel a-z / 
E47 toner) _ J 




I 

VSAHRIi/AHRARNT.01 : 


Aryl nyorocaroon receptor / jmux 
heterodimers I 




VSHIFF/ARNT.Ol 


AhR nuclear translocator homodimers J 


! V WMYB/VMYB 02 


v-Mvb 

.tZ.7 - - - J 


r 

VSMOKF/MOK2.01 

1 _ 


Ribonucleoprotein associated zinc finger 
protein MOK-2 (mouse) j 


;V$PA5C5/PAX5.01 


B-cell-specific activating protein 


1V$PBXC/PBX1 MEIS1.02 


Binding site for a Pbxl/Meisl heterodimer 

zi — --' ■ -~- : — : — — — = — > — — — — — - — — — -i 


\ V$MYOF/MYOGNF1.01 


Myogenin / nuclear factor l or related factors] 


jV$SRFF/SRF.03 


Serum responsive factor 


IVSCP2F/CP2.01 


CP2 


VSOAZF/ROAZ.01 


Rat C2H2 Zn finger protein involved in 
olfactory neuronal differentiation 


VSAHRH/AHR.01 


Aryl hydrocarbon / dioxin receptor 


i VSMIN3/MIJSCLE INI.01 


Muscle Initiator Sequence 


VSPAX5/PAX5.02 


B-cell-specific activating protein 
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^^^^^^^^^^^^^^^^^ 


( 

VSZBPF/ZF9.01 


Hore promoter-binding protein (CPBP) with : 
J Krueppel-type zinc fingers j 


] 

VallBUA/Alro.Ul < 


Member of b-zip tamily, induced by .cK ; 
-1amajyp/«;tTe<;<; hinds to the ERSE in j 

association with NF-Y j 


VSEGRF/NGFIC.Ol 1 


^erve growth factor-induced protein C 1 


VSZF5F/ZF5.01 


Zinc finger / POZ domain transcription : 
factor 


\rQ A 'DAT* 1 A PA (Y) 


Artivatnr nrotein 4 


V 3»ADDr/Mlr 1 .U 1 


IvTTRP-1 / TfcFXl comnlex * 


VSEGRF/EGR3 . 0 1 


early growth response gene 3 product : 


VSWHZFMVHN.01 


Winged helix protein, involved in hair 
keratinization and thymus epithelium 

QlIIclCIl 11 «111U11 ) 


j V$PAX5/rAX->.Ul 


"R_r>f»11 cr**=»rMfir artivntinc* Tvrntein i 


VSWHZF/WHN.01 


x\7inat*r{ l-i^liv tvrntpin involved in hair 

YV lll&Ctl l-lv/llA LU.ULwll.lj IllVUlrvU. "I AiCfcix 

keratinization and thymus epithelium | 
differentiation ! 


VSPAX5/PAX5.01 


B-cell-specific activating protein j 


IVSPAX5/PAX5.03 


P AX5 paired domain protein j 


jV$PAX5/PAX5.03 j 


PAX5 paired domain protein 


j 1 
JVSZBPF/ZF9.01 


Core promoter-binding protein (CPBP) with J 
3 Krueppel-type zinc fingers j 


V$CP2F/CP2.01 


CP2 1 


<V$MINI/MUSCLE INI.02 


Muscle Initiator Sequence 


;v$AP2F/AP2.01 ; 


Activator protein 2 j 


]V$PAX5/PAX5.01 


B-cell-specific activating protein j 


; ■'■ "' 

I VSAHRR/AHRARNT.02 


Arvl hvdrocarbon / Arnt heterodimers, fixed 
core 


t 


! V«A4TMT/\/fT T^fT V TNT 02 


Muscle Initiator Seauence \ 

IVlUuvlv Bill WltttVl t^f **Wi*ww I 


'j VSEGRF/EGR3.01 


early growth response gene 3 product 


» 

jVSSPlF/SPl.Ol 


stimulating protein 1 SP1, ubiquitous zinc 
finger transcription factor 


iV$ZBPF/ZF9.01 


Core promoter-binding protein (CPBP) with 
; 3 Krueppel-type zinc fingers 


| VSEGRF/EGR1 .01 


! Egr-l/Krox-24/NGFI-A immediate-early 
j gene product 


iVSEGRF/WTl.01 


1 Wilms Tumor Suppressor 
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VSSP1F/SP1.01 


stimulating protein 1 SP1, ubiquitous zinc i 

unger udiibL«ripu.uii icusiui 


\7(DPAT/PITD PA AT HI 


A/fammolian P t\mf* T T"R Pf* A A T V*nv ' 
IVLalllinallaQ V^-iypC X** 1 IV ^O.rvrv. 1 UUA j 


V$ZBPF/ZF9.01 


Core promoter-binding protein (CPBP) with 
3 Kmpnnel-tvne zinc finffers i 


VSEGRF/WT1.01 


Wilms Tumor Suppressor 


V$EGRF/WT1.01 


Wilms Tumor Suppressor ; 


VSNF1F/NF1.01 


Nuclear factor 1 


VSPDX1/PDX1.01 


Pdxl (BDX1/IPF1) pancreatic and intestinal 
homeodomain TF 



**matches are listed in order of occurrence in the corresponding sequence 

TFBS in hhyg3 

After removal of TFBS from hhyg2 (3 matches) 

5 









VSMINl/MUSCLE INI.02 


Muscle Initiator Sequence 


VSPAX5/PAX5.02 


B-cell-specific activating protein. 


VSVMYB/VMYB.02 i 


v-Myb ; 



**matches are listed in order of occurrence in the corresponding sequence 



TFBS in hHvgro 

10 Before removal of TFBS from hHygro (5 matches, excluding linker) 







VSMINI/MUSCLEj 
INI.02 '; 


1 

Muscle Initiator Sequence 


V$PAX5/PAX5.02| 


B-cell-specific activating protein 


VSAREB/AREB6. ! 
Q4 J 


AREB6 (Atplal regulatory element biading factor 6) 


VSVMYB/VMYB. 1 
Q2 ; 


v-Myb 


! 

VSCDEF/CDE.01 j 
i 

. „ ,» 


Cell cycle-dependent element, CDF-1 binding site 
(CDE/CHR tandem elements regulate cell cycle 
dependent repression) 



**matches are listed in order of occurrence in the corresponding sequence 
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TFBSinhhvg4 

After removal of TFBS from hHygro (4 matches) 




VSMINI/MUSCLE INI.02 



Muscle Liitiator Sequence 



V$PAX5/PAX5.02 



V$AREB/AREB6.04 



V$VMYB/VMm02 



J 



B-cell-specific activatingprotein 



AREB6 (Atplal regulatory element binding 
factor 6) 



v-Myb 



**matches are listed in order of occurrence in the corresponding sequence 

5 

Table 20 

Sequences in Synthetic Neomycin Genes 

10 TFBS inhneo 

Before removal of TFBS from hneo (69 matches) 







VSPCAT/CAAT.Ol 


cellular and viral CCAAT box 


V$ZFIA/ZE>.01 


Zinc finger with interaction domain j 


V$AP1F/TCF11MAFG.01 1 


TCF1 1/MafG heterodimers, binding to 
subclass of API sites 




VSMINI/MUSCLE INL01| 


Muscle Initiator Sequence 


VSAHRR/AHRARNT.01 ! 


Aryl hydrocarbon receptor / Arnt | 
heterodimers j 


, . . J 


VSHIFF/HIF1.02 j 


Hypoxia inducible factor, bHLH / PAS 
protein family j 


VSSP1F/GC.01 j 


GC box elements j 


• VSMINI/MUSCLE INI.02 


Muscle Initiator Sequence ] 


!V$CP2F/CP2.01 


CP2 ! 


' i 

VSWHZF/WHN.Ol j 


Winged helix protein, involved in hair \ 
keratinization and thymus epithelium j 

differentiation _ j 

— — — ~ ■" — -■' ■* -- r " — '-^ — 1 — ■ — '—^ — ; — - — '—— i 


!V$PAX5/PAX5.02 i 


B-cell-specific activating protein j 


i 

VSZF5F/ZF5.01 I 


Zinc finger / POZ domain transcription j 
factor j 


jvSZBPF/ZF9.01 


Core promoter-binding protein (CPBP) 
with 3 Krueppel-type zinc fingers 


VSZBPF/ZF9.01 


Core promoter-binding protein (CPBP)_ J 
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«rifh *X T5ri-ii#*nr»p1-'K/np 7inr. fin o^r** 
WILD. D JviUCppci"Ljrpw Ziiiiu imgoio 


VSHIFF/HIF1.02 


Hypoxia inducible factor, bHLH / PAS 

proicin idiniijr ^ 


1 

VSAHRR/AHRARNT.01 ! 


Aryl hydrocarbon receptor / Arnt 

llOlOlUUllilClo 


- 


VSNRSF/NRSE.Ol 


Neural-restrictive-silencer-element 


VSHIFF/HIFL02 


Hypoxia inducible factor, bHLH / PAS 
proiein iamiiy \ 


•: 

VSCREB/ATF6.02 i 


Activating iranscriptiuii lauiur u, uioiiiuoi 
of b-zip family, induced by ER stress ! 


V $JtW\Jvr/ V UR Jtv-AJv.U I 


VDR/RXR Vitamin D receptor RXR 
heterodimer site 




VSPCAT/CAAT.01 


cellular and viral CCAAT box 


VSNRSF/NRSE.01 


Neural-restrictive-silencer-element ; 


IVSP53F/P53.01 j 


Tumor suppressor p53 


; i 

jVSNEUR/NEURODl.Ol : 


DNA binding site for NEUROD1 (BETA- 
2/E47dimer) 1 


; ; 


\/<CT?"QfYV/TTQT7 Ol 
V a>JtiJDUA/ U Or . KJJ 


T Tnc+rpom ctiTYlill dt)"in (T ■Paf^tfYT 

UpoLTCcUIl oLIIllUiaLlllg idA/LUi ; 


: VSMYOD/MYUD.Uz i 


Myoblast determimng factor 


VSNRSF/NRSE.Ol 


Neural-restrictive-silencer-element 


! i 

VSWHZF/WHN.Ol j 

; j 


Winged helix protein, involved in hair 
keratinization and thymus epithelium 

Hi ffiwpn ti Sift nn 

UlUd Oil 11 dll vlil 


. V $bB U A/M YUM AX. U3 ' 


yAY Kinrli'-nrr cn+#»c 

ivi i vv-ivl/\a Dinaing sue& 


t | 

VSHESF/HES1.01 

i ! 


L/rosopniia nairy ana onnaiiuoi 01 bp 111 
homologue 1 (HES-1) 


i 

1 VSNEUR/NEURODLOl j 


DNA binding site for NEUROD1 (BETA- 
2/E47dimer) _ j 




IVSMYOD/MYOD.02 j 


Myoblast determining factor j 


t V ^XVi-I/J3 V / LjD V JV.VJ 1 


Prwte.in-Rsrr vims transcription factor R ; 


'VCDAY^/DAY^ OO 
: V o>r AA D/r AA j .UZ 


X> t/Cli-opoUlllO dvllvaUllg piuioill 


! V$ZF5F/ZF5.01 


ZjIIIC imgOl / x\J£a UOXllaLU LlalloUlipLlUIl 

factor 


V$ZF5F/ZF5.01 


Zinc finger / POZ domain transcription f 
factor 


IVSEGRF/WTl.Ol 


Wilms Tumor Suppressor 


1VSEGRF/WT1.01 


Wilms Tumor Suppressor 


VSZBPF/ZF9.01 


Core promoter-binding protein (CPBP) j 
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with 3 Krueppel-type zinc fingers 


VSMINI/MUSCLE INI.01 


Muscle Initiator Sequence 


: V$NRSF/NRSF 01 


Neuron-restrictive silencer factor \ 


J U$PflMI/PflMI 1 


REH-IP J 


' VSNRSF/NRSE 0 1 


Neural-restrictive-silencer-element ! 


VSMOKF/MOK2.02 


Ribonucleoprotein associated zinc finger . 
protein MOK-2 (human) 






Activator protein 2 j 


I 1 rtt* A THT7/A T»1 TT A1 

: VSAPIF/AP 1 FJ.01 


Activator protein 1 ! 


VSPAX5/PAX5.03 


P AX5 paired domain protein 


VSEGRF/EGR3 .0 1 


early growth response gene 3 product 


V$ WHZF/WIlN.O l 


Winged helix protein, involved in hair 
keratinization and thymus epithelium 
differentiation ; 




j — i— • \ 

^ V$PAX6/PA-X4 PD.Ol \ 


PAX4 paired domain binding site 


\ V^VMYRA/TVTYR 02 I 

j V J) V 1V1 1 DI V XVI X JO.v/Z, l 


v-Myb | 


V$BEL1/BEL1.01 


Bel-1 similar region (defined in Lenti virus 
LTRs) 




; ! 


Ribonucleoprotein associated zinc finger j 
protein MOK-2 (mouse) 




iVSEGRF/EGRl.Ol ! - 


Egr-l/Krox-24/NGFI-A immediate-early 

J X 

gene product 


1 • 


: VREBOX/ATF6 01 


Member of b-zip family, induced by ER . 
damage/stress, binds to the ERSE in 
association with NF-Y j 




;V$EGRF/EGR3.01 i 


early growth response gene 3 product 


l t VSNRSF/NRSE 0 1 ! 


Neural-restrictive-silencer-element 


SvSETSF/ETSl 01 ! 


c-Ets-1 binding site 


>V$NRSF/NRSF 01 


Neuron-restrictive silencer factor 


iV$SPlF/SP1.01 j 


stimulating protein 1 SP1, ubiquitous zinc! 
finger transcription factor j 


i 1 




z<inc linger transcription iactor zjtxr-oy 


;V$PAX5/PAJX5.03 ! 


PAX5 paired domain protein J 


^VSGREF/ARE.Ol 


Androgene receptor binding site \ 


l ; 

IV$BCL6/BCL6.02 ! 


POZ/zinc finger protein, transcriptional 
repressor, translocations observed in 
diffuse large cell lymphoma 


f 


jv$CLOX/CDP.01 


cut-like homeodomain protein 
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**matches are listed in order of occurrence in the corresponding sequence 
TFBS in bneo3 

After removal of TFBS from hneo2 = before removal of TFBS from hneo3 (0 
5 matches) 

TFBS in hneo4 

After removal of TFBS from hneo3 = before removal of TFBS from hneo4 (7 
matches) 

10 







VSP AX5/P AX9.0 1 ; 


Zebrafish PAX9 binding sites 1 


V$ AARF/ AARE.0 1 


Amino acid response element, ATF4 binding 
site 


VSP53F/P53.02 


Tumor suppressor p5 3 (5 f half site) 


VSAP1R/BACH2.01 


Bach2 bound TRE 


VSNEUR/NEUROG.01 ; 


Neurogenin 1 and 3 (ngnl/3) binding sites , 


VSCMYB/CMYB.01 j 


c-Myb, important in hematopoesis, cellular j 
equivalent to avian myoblastosis virus \ 
oncogene v-myb 


VSHOXF/CRX.01 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 



**matches are listed in order of occurrence in the corresponding sequence 



TFBS in hneo5 

After removal of TFBS from hneo4 (0 matches) 

15 

Table 21 

Sequences in Synthetic Puromvcin Genes 

20 TFBS matches in hpuro 

Before removal of TFBS from hpuro (68 matches) 







: 

1 

VSCDEF/CDE.01 1 


Cell cycle-dependent element, CDF-1 j 
binding site (CDE/CHR tandem ! 
elements regulate cell cycle 
dependent repression) j 


VSP AX3/P AX3 . 0 1 ! 

1 


Pax-3 paired domain protein, i 
expressed in embryogenesis, ! 
mutations correlate to Waardenburg j 
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3 y iim uiuv 




Activating transcription factor 6, 
member of b-zip family, induced by 
ER stress i 


VSEBOR/XBP1.01 


X-box-binding protein 1 j 


V &r DDr/r jj.uj 


Tumor sunoressor d53 (3* half site) 


VSHESF/HES1.01 

! 


Drosophila hairy and enhancer of 
cnlit hnmoloene 1 fflTES-1) : 


VSMTF1/MTF-1.01 1 


Metal transcription factor 1, MRE 


VSEKLF/EKLF.Ol 


Erythroid krueppel like factor 
(EKLF) 


VSEGRF/EGR1.01 


Egr-l/Krox-24/NGFI-A immediate- ; 
early gene product 


:V$EBOX/ATF6.01 

_ _ j 


Member of b-zip family, induced by ; 
ER damage/stress, binds to the ERSE 
in association with NF-Y 


! : 
V$EBL)a/A 1 ro.U 1 


Member of b-zip family, induced by 
ER damage/stress, binds to the ERSE . 
in association with NF-Y 


I ViCMYD/UMYiJ.Ul j 


c-Myb, important m hematopoesis, 

CenUXoT CtJUIVaXCilL LU a V loll » 

myoblastosis virus oncogene v-myb 


i i 
; VSAHRR/AHRARNT.Ol 


Aryl hydrocarbon receptor / Arnt 
heterodimers I 




1 VSEBOX/MYCMAX.03 


MYC-MAX bidding sites \ 


VSRORA/RORA2.01 


RAR-related orphan receptor alpha2 \ 


V$EBOXMYCMAX.03 i 


MYC-MAX binding sites _ j 


j i 
|V$HIFF/HIF1.02 j 


Hypoxia inducible factor, bHLH / 
PAS nrotein family 




AQrlv ornwth rponnnse trene 3 TYToduct 


iVSEGRFAVTl.Ol 


Wilms Tumor Suppressor 


1 
t 

; VSHAML/AML3 .0 1 

I 


Runt-related transcription factor 2 / 
CBFA1 (core-bmding factor, runt 
domain, alpha subunit 1) 


!V$PAX5/PAX5.03 


PAX5 paired domain protein J 


I VSEBOX/ATF6.01 


Member of b-zip family, induced by 
ER damage/stress, binds to the ERSE 
in association ^with NF-Y 


IVSHIFF/HIF1.02 


Hypoxia inducible factor, bHLH / 
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^^^^^^^^^^^^^ 




PAS protein family 


VSZBPF/ZBP89.01 


Zinc finger transcription factor ZBP- 


VSOAZF/ROAZ.Ol \ 


Kat C2rl2 Zn linger protein involved , 
in olfactory neuronal differentiation ; 




VJ/vLrA-oOX 


VSEBOX/MYCMAX.03 


MYC-MAX binding sites j 


VSMYOD/MYF5 .01 


Myf5 myogenic bHLH protein 


VSAP4R/TAL1 BETAE47.0 1 


Tal- lbeta/E47 heterodimer 


VSNEUK/NEUROG.Ol 


Neurogenin l and 3 (ngnl/3) binding 
sites 




VSHAND/HAND2 E12.01 


Heterodimers of the bHLH 
transcription factors HAND2 
(Thing2)andEl2 




VSMAZF/MAZR.01 


MYC-associated zinc finger protein j 
related transcription factor 


VSZBPF/ZNF202.0I 

. : 


Transcriptional repressor, binds to 
elements found predominantly in 
genes that participate in lipid ! 
metabolism j 


VSSPlF/SPl.Ol i 

j 


Stimulating protein l SPl, ubiquitous! 
zinc finger transcription factor ; 


\A<P A "DOT?/ A HO A1 ' 

V 3>Arzr/ Arz.U 1 


Activator protein 2 * 


VSRREB/RREB1.01 1 


Ras-responsive element binding 
protein i 


VSXBBF/MIF1.01 ; 


MIBP-1 / RFX1 complex ; 


VSCREB/TAXCREB.Ol 


Tax/CREB complex j 


— z — ~z — zz 1 

VSEGRF/EGR3.01 i 


early growth response gene 3 product [ 


VSMOKF/MOK2.01 


Ribonucleoprotein associated zinc j 
nnger protem MOK-2 (mouse) ; 


i 

VSMOKF/MOK2.01 j 


Ribonucleoprotein associated zinc j 
finger protein MOK-2 (mouse) ; 


V$P AX5/P AX5 . 0 1 


B-cell-specific activating protein I 


VSNRSF/NRSE.01 i 


Neural-restrictive-silencer-element 


VSMINI/MUSCLE INI.02 ; 


Muscle Initiator Sequence 


i 

VSEBOX/ATF6.0 1 i 

i 


Member of b-zip family, induced by j 
ER damage/stress, binds to the ERSE ; 
in association with NF-Y 


V$DE AF/NUDR.0 1 j 


NUDR (nuclear DEAF-1 related 
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transcriptional regulator protein) 


VSAHRR/AHRARNT.0 1 


Aryl hydrocarbon receptor / Arnt 
heterodimers 




VSZF5F/7F5 01 


Zinc finger / POZ domain 
transcription factor 


VSEGRF/EGR1 01 


Egr-l/Krox-24/NGFI-A immediate- 
early gene product 


VSHIFF/HIF1.02 


Hypoxia inducible factor, bHLH / 
PAS protein family 


VSETSF/ETS1.01 


c-Ets-1 binding site 


V$STAT/STAT1.01 


Signal transducer and activator of 
transcription 1 


VSBCL6/BCL6.01 

i 

| 


POZ/zinc finger protein, 
transcriptional repressor, 
translocations observed in diffuse 
large cell lymphoma 


VSZF5F/ZF5.01 1 


Zinc finger / POZ domain I 
transcription factor 


V$BCL6/BCL6.02 


POZ/zinc finger protein, 
transcriptional repressor, \ 
translocations observed in. diffuse j 
large cell lymphoma \ 


VSEGRF/EGR3.01 ! 


early growth response gene 3 product j 


VSCREB/ATF6.02 : 


Activating transcription factor 6, ; 
member of b-zip family, induced by ; 
ER stress j 


V$HIFF/fflF1.02 


Hypoxia inducible factor, bHLH / 
PAS protein family 


VSEBOR/XBPL01 


X-box-binding protein 1 


VSDEAF/NUDR.01 


NUDR (nuclear DEAF-1 related 
transcriptional regulator protein) 


VSRXRF/VDR RXR.01 


VDR/RXR Vitamin D receptor RXR ! 
heterodimer site j 




VSAP2F/AP2.01 j 


Activator protein 2 


i 

VSREBV/EBVR.01 1 


Epstein-Barr virus transcription j 
factor R ' 


i 

VSZBPF/ZF9.01 


Core promoter-binding protein 
(CPBP) with 3 Krueppel-type zinc j 
fingers ; 


V$MYOD/LMO2COM.01 j 


Complex of Lmo2 bound to Tal- 1 , 
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E2A proteins, and GATA-1, half-site . 

i 
i 


VSAREB/AREB6.03 


AREB6 (Atpl al regulatory element j 
binding factor 6) j 


VSRXRF/FXRE.01 


Farnesoid X - activated receptor 1 
(RXR/FXR dimer) 


V$ AHRR/AHR.0 1 


Aryl hydrocarbon / dioxin receptor 



♦♦matches are listed in order of occurrence in the corresponding sequence 

TFBS matches in hpurol 

After removal of TFBS from hpuro = before removal of TFBS from hpurol 
5 (4 matches) 







VSNEUR/NEUROG.Olj 


Neurogenin 1 and 3 (ngnl/3) binding sites ; 


VSPAX5/PAX5 .02 


B-cell-specific activating protein 


VSREBV/EBVR.01 


Epstein-Barr virus transcription factor R 


VSAHRR/AHR.01 


Aryl hydrocarbon / dioxin receptor 



♦♦matches are listed in order of occurrence in the corresponding sequence 

TFBS matches in hpuro2 
1 0 After removal of TFBS from hpuro 1 (2 matches) 







VSNEUR/NEUROG.01 


Neurogenin 1 and 3 (ngnl/3) bmding sites j 


VSBCL6/BCL6.02 


POZ/zinc finger protein, transcriptional | 
repressor, translocations observed in j 
diffuse large cell lymphoma j 





♦♦matches are listed in order of occurrence in the corresponding sequence 

Example 7 

15 Summary of Design of Synthetic Firefly Luciferase Genes 

TF binding sites and search parameters 

The TF binding sites are from the TF binding site library ('Matrix Family 
Library") that is part of the GEMS Launcher package. Each TF binding site 
("matrix") belongs to a matrix family that groups functionally similar matrices 

162 



WO 2006/034061 PCT7US2005/033218 

together, eliminating redundant matches by Matlnspector professional (the 
search program). Searches were limited to vertebrate TF binding sites. Searches 
were performed by matrix family, i.e. the results show only the best match from 
a family for each site. Matlnspector default parameters were used for the core 
5 and matrix similarity values (core similarity = 0.75, matrix similarity = 
optimized). 



Table 22 
Luc Gene Designations 
1 0 Synthetic luc gene (versions A and B) 



" : . ; Sequence t 






Luc 


wild-type gene 


(not applicable) 


luc+ 


improved gene from Promega's pGL3 
vectors 


(not applicable) 


hluc+ 


Improved gene form Promega's pGL3(R2.1)- 
Basic 


(not applicable) 




Codon optimization strategy A 




hluc+ver2Al 


codon optimized luc+ (strategy A) 


Ver3.0Nov 2002 


hluc+ver2A2 


First removal of undesired sequence matches 


Ver 3.0 Nov 2002 


hluc+ver2A3 


Second removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2A4 


Third removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2A5 


Fourth removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2A6 


Fifth removal of undesired sequence matches 


Ver 3.0 Nov 2002 


hluc+ver2A7 


Sixth removal of undesired sequence 
matches 


Ver 3.1.1 Apr 
2003 


hluc+ver2A8 


Removal of BgR (RE) site 


Ver 3.1.1 Apr 
2003 




Codon optimization strategy B 




hluc+ver2Bl 


codon optimized luc+ (strategy B) 


Ver 3.0 Nov 2002 


Muc+ver2B2 


First removal of undesired sequence matches 


Ver 3.0 Nov 2002 


hluc+ver2B3 


Second removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2B4 


Third removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2B5 


Fourth removal of undesired sequence 
matches 


Ver 3.0 Nov 2002 


hluc+ver2B6 


Fifth removal of undesired sequence matches 


Ver 3.0 Nov 2002 


hluc+ver2B7 


Sixth removal of undesired sequence 
matches 


Ver 3.1.1 Apr 
2003 


hluc+ver2B8 


Removal ofSmal (RE), Ptxl (TF) sites 


Ver 3.1.1 Apr 
2003 


hluc+ver2B9 


Removal of additional CpG sequences 


Ver 3.1.1 Apr 
2003 
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hluc+ver2B10 


Removal of BgK (RE) site 


Ver 3.1.1 Apr 
2003 



* the sequence names designate open reading frames; RE = restriction enzyme 
recognition sequence 

Table 23 

Sequences in Synthetic Luc Genes (version A) 
TFBS in hluc+ver2Al 



10 Before removal of TFBS from hluc+ver2Al (1 10 matches) 



'IH1HBI 




VSMINI/MUSCLE INI. 

02 


Muscle Initiator Sequence 


VSWHZF/WHN.01 


winged helix protein, involved in hair keratinization and 
thymus epithelium differentiation 


np PT7 /pt? ni 


Progesterone receptor binding site j 


VSMAZF/MAZR.01 1 


MYC-associated zinc finger protein related transcription 
factor 


VSSPlF/SPl.Ol 


stimulating protein l SPl, ubiquitous zinc finger 
transcription factor 


VSZBPF/ZBP89.01 


Zinc finger transcription factor ZBP-89 


VSSFlF/SFl.Ol I 


SFl steroidogenic factor l 


VSEGRF/NGFIC.Ol < 


Nerve growth factor-induced protein C 


VSMINI/MUSCLE INI. : 
01 | 


Muscle Initiator Sequence j 


VSEGRF/EGR2.01 


Egr-2/Krox-20 early growth response gene product j 


VSZF5F/ZF5.01 


Zinc finger / POZ domain transcription factor j 


VSHESF/HESl.Ol | 


Drosophila hairy and enhancer of split homologue I (HES-j 


VSNRSF/NRSE.01 


neural-restrictive-silencer-element 


VSPAX5/PAX5 .02 j 


B-cell-specific activating protein ! 


l 

VSHAML/AML3.01 j 

j 


Runt-related transcription factor 2 / CBFAl (core-binding j 
factor, runt domain, alpha subunit I) j 


VSGREF/PRE.Ol 


Progesterone receptor binding site 


VSP53F/P53.01 ! 


tumor suppressor p53 | 


VSZF5F/ZF5.01 


Zinc finger / POZ domain transcription factor j 


VSEBOX/ATF6.01 


Member of b-zip family, induced by ER damage/stress, j 
binds to the ERSE in association with NF-Y j 


VSEGRF/EGR3 .0 1 


early growth response gene 3 product 


VSNFlF/NFl.Ol ! Nuclear factor l ! 
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VSEGRJF/EGR3 .0 1 


early growth response gene 3 product 


V$REB"V/EBVR.01 


Epstein-Barr virus transcription factor R 


VSMOKF/MOK2.0 1 


Ribonucleoprotein associated zinc finger protein MOK-2 . 
[mouse) j 


VSPBXC/PBX1 MEIS1 
M 


Binding site for a Pbxl/Meisl heterodimer t 


VSXSEC/STAF.Ol 


Se-Cys tRNA gene transcription activating factor 


VSCOMP/COMP 1 .0 1 


COMP 1 , cooperates with myogenic proteins in 
multicomponent complex 1 


V$MYOF/MY OGNF 1 .0 , 

1 


Myogenin / nuclear factor 1 or related factors 1 


VSNEUR/NEUROD1.0 

i. 


DNA binding site for NEUROD1 (BETA-2 / E47 dimer) 


VSMYOD/MY OD.02 


myoblast determining factor 


VSAP2F/AP2.0 1 


Activator protein 2 


■VSEVI1/EVI1.02 , 


Ecotropic viral integration site 1 encoded factor _' t 


i VSSMAD/SMAD4.01 


Smad4 transcription factor involved in TGF-beta signaling 


IVSMYOD/MYF5.01 : 


Myf5 myogenic bHLH protein ! 


'VSHESF/HESl.01 


Drosophila hairy and enhancer of split homologue 1 (HES-i 
1) 


'VSPAXS/PAXS.Ol 


B-cell-specific activating protein 


IVSEBOX/ATF6.01 j 


Member of b-zdp family, induced by ER damage/stress, 
binds to the ERSE in association with NF-Y 


!V$SP1F/GC01 


GC box elements ! 


VSM A_ZF/MAZR 0 1 


MYC-associated zinc finger protein related transcription j 
factor : 


VSRREB/RREB1.01 


Ras-responsive element binding protein 1 


, VSAHRR/AHRARNT.O i 

_.! 


Aryl hydrocarbon receptor / Arnt heterodimers 


iVSHIFT/HIFl.02 ! 


Hypoxia inducible factor^bEDLH / PAS protein family 


j VSZF5F/ZF5.01 


Zinc finger / POZ domain transcription factor 


t I 

! VSEBOX/ATF6.0 1 




Member of b-zip family, induced by ER damage/stress, ■ 
binds to the ERSE in association with NF-Y j 


VSYY1F/YY1.01 


Yin and Yang 1 J 


iVSETSF/GABP.01 


GABP: GA binding protein 


VSMOKF/MOK2.01 

■ 


Ribonucleoprotein associated zinc finger protein MOK-2 
(mouse) 


i VSETSF/ELKl .02 


Elk-1 ^ ! 


1 VSEBOX/MYCMAX.03 


MYC-MAX binding sites 


j 

iV$E4FF/E4F.01 

1 . 


GLI-Krueppel-related transcription factor, regulator of 
adenovirus E4 promoter 
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VSXBBF/RFX1.01 


X-box binding protein RFX1 


VSEVI1/EVI1.06 


Ecotropic viral integration site 1 encoded factor 


VSMOKF/MOK2.01 ' 


Ribonucleoprotein associated zinc finger protein MOK-2 . 
(mouse) . 


:V$NF1F/NF1.01 


Nuclear factor 1 1 


VSPBXC/PBXl MEIS1 
,02 


Binding site for a Pbxl/Meisl heterodimer 


iV$ZF5F/ZF5.01 


Zinc finger / POZ domain transcription factor 


!V$HESF/HES1.01 


Drosophila hairy and enhancer of split homologue 1 (HES-j 
D ... ! 


VSPAX5/PAX5.01 


B-cell-specific activating protein 


VSETSF/GABP.Ol 


GABP: GA binding protein 


VSMYOD/MYOD.02 


myoblast detennining factor 


VSXSEC/STAF.Ol 


Se-Cys tRNA gene transcription activating factor 


\t<pc\ a r yT? fry r\ a t ai 

L j 


Rat C2H2 Zn finger protein involved in olfactory neuronal 
differentiation 


iV$AP2F/AP2.01 


Activator protein 2 | 


< 

V 3>r Aa 51 r AX3 . 0 1 


Pax-3 paired domain protein, expressed in embryogenesis, j 
mutations correlate to Waardenburg Syndrome 


VSAP2F/AP2.01 


Activator protein 2 


iVSMTFl/MTF-l.Ol 


Metal transcription factor 1, MRE J 


! Vaor Ir/r Ir.Ol 


Alpha (l)-fetoprotein transcription factor (FTF), liver 
receptor homologue- 1 (LHR-1) 


VSSMAD/SMAD4.01 ! 


Smad4 transcription factor involved in TGF-beta signaling 


: VSNFKB/NFKAPPAB. 

ioi ; 


NF-kappaB 


! VSEKLF/EKLF.01 ! 


Erythroid krueppel like factor (EKLF) 


'VSCREB/TAXCREB.01 i 


Tax/CREB complex j 


, Viczrr/iiZr.u,} j 


E2F, involved in cell cycle regulation, interacts with Rb 
pi 07 protein 


tV$CP2F/CP2.01 j 


CP2 


VSAHRR/AHRARNT.O ' 

1 

i : 


Aryl hydrocarbon receptor / Arnt heterodimers 


VSEGRF/EGR2.01 


Egr-2/Krox-20 early growth response gene product J 


VSZF5F/ZF5.01 \ 


Zinc finger / POZ domain transcription factor 


| V$EBOR/XBP1.01 X-box-binding protein 1 j 


j VSFKHD/XFD3.01 Xenopus fork head domain factor 3 i 


} VS AP2F/AP2.0 1 Activator protein 2 


VSEGRF/NGFIC.01 , (Nerve growth factor-induced protein C 


VSPCAT/ACAAT.01 Avian C-type LTR CCAAT box j 
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mum 




VSPBXC/PBXl MEIS1 


Binding site for a Pbxl/Meisl heterodiiner 


,02 


VSAHRR/AHRARNT 0 


Aryl hydrocarbon / Arnt heterodimers, fixed core 


2 


VSMOKF/MOK2.0 1 


Ribonucleoprotein associated zinc finger protein MOK-2 
(mouse) 




Glucocorticoid receptor, C2C2 zinc finger protein binds 
glucocorticoid dependent to GREs 


VSNEUR/NEUROD 1 .0 


DNA binding site for NEUROD1 (BETA-2 / E47 dimer) 


1 


VSNRSF/NRSE.O 1 


neural-restrictive-silencer-element 


VSNRSF/NRSE.01 


neural-restrictive-silencer-element 


VSAHRR/AHRARNT.O 


Aryl hydrocarbon / Arnt heterodimers, fixed core 


2 


VSEBOX/ATF6.01 


Member of b-zip family, induced by ER. damage/stress, 
binds to the ERSE in association with NE-Y j 


VSHIFF/HIF1.02 | 


Hypoxia inducible factor, bHLH / PAS protein family 


VSEGRF/EGR3 .0 1 !|earlv growth response gene 3 product 


VSEGRF/EGR3 .0 1 


early growth response gene 3 product j 


VSWHZF/WHN.Ol 


winged helix protein, involved in hair keratinization and 
thymus epithelium differentiation _ i 


' VSAP2F/AP2.01 


Activator protein 2 ... ! 


VSHIFF/HIF1.02 j 


Hypoxia inducible factor, bHLH / P ASjprotein farmly^ 


VSNRSF/NRSE.01 ; 


neural-restrictive-silencer-element 


,V$ZFIA/ZID.01 


zinc finger with interaction domain 


■ VSSMAD/SMAD4.01 


Smad4 transcription factor involved in TGF-beta signaling 


: VSAHRR/AHRARNT.O 


Aryl hydrocarbon / Arnt heterodimers, fixed core j 


:2 _ , 


j VSEBOX/MYCMAXOl ! 


c-Myc/Max heterodimer j 


[ VSEBOX/USF.03 


upstream stimulating factor _ _ _ j 


VSEGRF/EGRl.Ol j 


Egr-l/Krox-24/NGFI-A immediate-early gene product 


VSMINI/MUSCLE INI. 


Muscle Initiator Sequence 1 

, .- — : — , ■ 


;oi . 


! 

|V$MOKF/MOK2.01 


Ribonucleoprotein associated zinc finger protein MOK-2 
(mouse) 


i VSNRSF/NRSE.01 


neural-restrictive-silencer-element 


VttNFlF/NFl.Ol 


Nuclear factor 1 


VSSF1F/SF1.01 


SF1 steroidogenic factor 1_ 



**matches are listed in order of occurrence in the corresponding sequence 
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TFBSinhluc+ver2A3 

After removal of TFBS from hluc+ver2A2 = before removal of TFBS 
from hluc+ver2A3 (8 matches) 







VSEGRF/EGR2.0 1 1 


Egr-2/Krox-20 early growth response gene 
product 




VSHAML/AML3.01 ! 


Runt-related transcription factor 2 / CBFAl 
(core-binding factor, runt domain, alpha 
subunit l) 


i 


VSMYOF/MYOGNFl .01 


Myogenin / nuclear factor l or related factors 


VSNF1F/NF1.01 


Nuclear factor l 


VSETSF/GABP.Ol 


GABP: GA binding protein 


VSNFKB/NFKAPPAB.Ol 


NF-kappaB 


VSEKLF/EKLF.Ol 


Erythroid krueppel like factor (EKLF) 


VSFKHD/XFD3.01 ! 


Xenopus fork head domain factor 3 



* "matches are listed in order of occurrence in the corresponding sequence 



10 



15 



TFBS inhluc+ver2A6 

After removal of TFBS from hluc+ver2A5 (2 matches) 







VSHAML/AML3.01 


Runt-related transcription factor 2 / CBFAl (core-binding 
factor, runt domain, alpha subunit 1) J 




VSFKHD/XFD3 .0 1 


Xenopus fork head domain factor 3 j 



**matches are listed in order of occurrence in the corresponding sequence 

TFBS in hluc+ver2A6 

Before removal of TFBS from hluc+ver2A6 (4 matches) 




VSPAX5/PAX5.03 j 



PAX5 paired domain protein 



VSLEFF/LEF1.02 i 



VSIKFF/IRF7.01 



TCF/LEF-1, involved in the Wnt signal transduction 
pathway 



Interferon regulatory factor 7 (IRF-7) 



i VSFKITOiOmDS.Ol . Xenopus fork head domain factor 3 



**matches are listed in order of occurrence in the corresponding sequence 
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TFBS inhluc+ver2A7 

After removal of TFBS from hluc+ver2A6 = before removal of TFBS 
from hluc+ver2A7 (1 match) 









VSFKHD/XFD3 .01 ; 


Xenopus fork head domain factor 3 j 



TFBS in hluc+ver2A8 

After removal of TFBS from hluc+ver2A7 (1 match) 







VSFKHD/XFD3.01 


Xenopus fork head domain factor 3 



10 

Table 24 

Sequences in Synthetic Luc Genes (version B) 

15 TFBSinhluc+ver2Bl 

Before removal of TFBS from hluc+ver2Bl (187 matches) 







VSHOXF/PTX1.01 


Pituitary Homeobox 1 (Ptxl) 


VSOCT1/OCT1.04 | 


octamer-binding factor 1 


V$OCTP/OCT1P.01 ! 


octamer-binding factor 1, POU-specific domain j 


VSNKXH/NKX25.02 


homeo domain factor Nkx-2.5/Csx, tinman 
homolog low affinity sites 


V$B ARB/BARBIE. 0 1 


barbiturate-inducible element 


VSTBPF/TATA.01 1 


cellular and viral TATA box elements j 


VSGATA/GATA.0 1 j 


GATA binding site (consensus) 


VSAP4R/AP4.01 j 


Activator protein 4 J 


VSHEN1/HEN1.02 i 


tffiNl 


VSSRFF/SRF.01 ! 


serum response factor 


VSPARF/DBP.01 i 


Albumin D-box binding protein j 


VSMOKF/MOK2.01 I 


Ribonucleoprotein associated zinc finger protein! 
MOK-2 (mouse) j 


VSEVI1/EVI1.04 ! 


Ecotropic viral integration site 1 encoded factor 


VSGni/GfIlB.01 i 


Growth factor independence 1 zinc finger j 
protein Gfi-IB ( 


VSRBPF/RBPJK.01 j 


Mammalian transcriptional repressor RBP- j 
Jkappa/CBFl j 


VSTBPF/TATA.02 


Mammalian C-type LTR TATA box \ 
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VSAP4R/TAL1 ALPHAE47.01 


Tal-lalpha/E47 heterodimer 


VSSRFF/SRF.01 


serum response factor 


VSOCTP/OCT1P.01 


octamer-binding factor 1, POU-specific domain ( 


VSBRNF/BRN2.01 


POU factor Bm-2(N-Oct 3) J 


VSCREB/E4BP4.0 1 


E4BP4, bZIP domain, transcriptional repressor j 


VSVBPF/VBP.01 


PAR-type chicken vitellogenin promoter- 
binding protein 


VSEVI1/EVI1.04 


Ecotropic viral integration site 1 encoded factor | 


VSCLOX/CDPCR3 .0 1 j 


cut-like homeodomain protein _ j 


VRGFTI/OfnR 01 

V J) v_J L x if VJ i-l 1 U. \J 1 


Growth factor independence 1 zinc finger ' 
protein Gfi-IB 


VSGATA/LMO2COM.02 


complex of Lmo2 bound to Tal-1, E2A proteins, 
and GATA-1, half-site 2 


VSSRFF/SRF.01 


serum response factor 


VSHOXT/MEIS1 HOXA9.01 > 


Homeobox protein MEIS1 binding site 


VSOCT1/OCT1.03 \ 


octamer-binding factor 1 ! 


i 

VSGFIl/GFIl.Ol ! 


Growth factor independence 1 zinc finger 
protein acts as transcriptional repressor 




Liver enriched Cut - Homeodomain 
transcription factor HNF6 (ONECUT) 


VSHAML/AML1.01 | 


runt-factor AML- 1 > 


VSGREF/PRE.01 j 


: j 

Progesterone receptor binding site j 


VSSTAT/STAT5.01 ' 


■■ — — ■■ - ■■■■■ ■■ ■ I 

STAT5 : signal transducer and activator of j 

transcription 5 i 


V$TBPF/TATA.01 


cellular and viral TATA box elements i 


VSCLOX/CDP.01 


cut-like homeodomain protein : 


V$FKHD/HFH8.01 j 


HNF-3/Fkh Homolog-8 | 


VSFAST/FAST1.01 


FAST-1 SMAD interacting protein ! 


VSGFIl/GfllB.Ol 

_^ 


Growth factor independence 1 zinc finger 
protein on- 1x3 


\r®£~* a DT/P ADT1 ni i 
V^L/AKl/UAKI 1.U1 I 


Cart-1 (cartilage homeoprotein 1) 


V 3>rlM 1 Jd/JVL 1 t>r . U i 1 


muscie-specmc ivix Dinaing sue 


VSTBPF/TATA.0 1 | 


cellular and viral TATA box elements 


VSFKHD/XFD2.0 1 j 


Xenopus fork head domain factor 2 


VSBRNF/BRN2.0 1 j 


POU factor Bm-2 j?W)ct_3) ^ _ j 


VSMEF2/AMEF2.0 1 i 


myocyte enhancer factor i 


VSBRNF/BRN2.0 1 ! 


POU factor Brn-2 (N-Oct 3) j 


i 

VSBEL1/BEL1.01 


Bel-1 similar region (defined in Lenti virus j 
LTRs) j 


VSNOLF/OLF1.01 


olfactory neuron-specific factor \ 
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VSOCT1/OCT1.06 


octamer-binding factor 1 


VSNFKB/NFKAPP AB . 02 


NF-kappaB 


VSBCL6/BCL6.02 


POZ/zinc finger protein, transcriptional 
repressor, translocations observed in diffuse 
large cell lymphoma 


VSMOKF/MOK2.01 


Ribonucleoprotein associated zinc finger protein 
MOK-2 (mouse) 


VSHEAT/HSF1.01 


heat shock factor 1 


VSOCTP/OCT1P.01 j 


octamer-binding factor 1, POU-specific domain 


v$piti/piti.oi 


Pit 1 , GHF- 1 pituitary specific pou domain 
transcription factor 


VSHOXF/CRX.0 1 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 


VSHNF6/HNF6.01 


Liver enriched Cut - Homeodomain 
transcription factor HNF6 (ONECUT) 


VSCLOX/CLOX.0 1 


Clox 


VSBCL6/BCL6.02 | 


POZ/zinc finger protein, transcriptional i 
repressor, translocations observed in diffuse i 
large cell lymphoma j 


VSHOXF/PTX1.01 


Pituitary Homeobox 1 (Ptxl) J 


VSGATA/GATAl .02 


GATA-binding factor 1 


VSFKHD/FREAC4.0 1 | 


Fork head RElated ACtivator-4 


■" ' """1 

i 

VSE4FF/E4F.01 ! 


GLI-Krueppel-related transcription factor, 
regulator of adenovirus E4 promoter 


VKPDX1/TST 1 01 - 


Pancreatic and intestinal lim-homeodomain 
factor 


VSCART/CART1.01 


Cart-1 (cartilage homeoprotein 1) 


VSGFI1/GFI1.01 j 


Growth factor independence 1 zinc finger 
protein acts as transcriptional repressor 


V < CTT?T7T7/TRTT / 1 ftl i 
v ^iivrr/Livr j.ui | 


mteneron regulatory iactor 5 {ikt-j ) \ 




barbiturate-inducible element 




homeo domain factor Pbx-1 ! 


V j» 11 V 1 1 / XI V 1 1 . uz 


Ecotropic viral integration site 1 encoded factor t 


V J>VJr\ 1/VUa 1 /\Z.U 1 j 


i A-omaing iacxor z 


V a>JjXVL>)r/IJl\J.NZ.U i j 


POTT -fnr>ff\f Dm O f\T Or»t 1"\ 

r vju iacior orn-z {in-uci j j 


VSPARF/DBP.01 


Albumin D-box binding nrotein 


VSBRNF/BRN3.0 1 ! 


POU transcription factor Brn-3 


VSZBPF/ZBP89.01 j 


Zinc finger transcription factOT ZBP-89 j 


VSCREB/TAXCREB.02 i 


Tax/CREB complex _ _ j 


VSGREF/PRE.01 \ 


Progesterone receptor binding site j 


VSRBPF/RBP JK. 0 1 i 


Mammalian transcriptional repressor RBP- 1 
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Jkappa/CBFl 


VSGATA/GATA3 .02 


GATA-binding factor 3 


VSSTAT/STAT.01 


signal transducers and activators of transcription 


VSIKRS/IK2.01 


Bcaros 2, potential regulator of lymphocyte 
differentiation 1 


VSSRFF/SRF.01 


serum response factor 


VSSEF1/SEFL01 


SEF1 binding site 


VSHAML/AML1.01 


runt-factor AML-1 j 


1 

V$MOKF/M OK2 02 1 


Ribonucleoprotein associated zinc finger protein 
MOK-2 (human) j 


VSFKHD/FREAC2.0 1 


Fork head RElated ACtivator-2 


VSHMTB/MTBF.0 1 


muscle-specific Mt binding site 


VSGFI1/GFI1.01 


Growth factor independence 1 zinc finger 
protein acts as transcriptional repressor 


VSECAT/NFY.03 


nuclear factor Y (Y -box binding factor) 


VSHOXT/MEISl HOXA9.01 i 


Homeobox protein MEIS ljrinding site 


VSPCAT/ACAAT.01 : 


Avian C-type LTR CCAAT box 


VSHNF6/HNF6 01 


Liver enriched Cut - Homeodomain 
transcription factor HNF6 (ONECUT) 


VSCLOX/CLOX.01 


Clox , 


VSGATA/GATA3.02 ! 


GATA-bii^^factor 3 _ j 


V$AREB/AREB6.04 ! 

i 


AREB6 (Atplal regulatory element binding 
factor 6) 


ATA /f" 1 ATA1 fll 

V $CjA 1 A/LtA 1 A3 .Uz 


IjA 1 A-oinaing iactor o 


V3>r JKxlJJ/JlJNriii.U 1 


Hepatocyte Nuclear Factor 3beta 


A/'<CTI>T7T7/n>T71 fi1 


interferon regulatory factor 1 


X^CWVU/MTYII A1 t 

ViJNJKJUtl/JNJsJt^ 1.U1 | 


prostate-specific homeodomain protein NKX3. 1 


VSPBXF/PBX1.01 j 


homeo domain factor Pbx-1 


VSECAT/NFY.03 


nuclear factor Y (Y -box binding factor) 


VSPBXC/PBXl MEIS1.02 j 


— — - — '■ — " — — - — — — * — — • — | 

Binding site for a Pbxl/Meisl heterodimer 


■ — 1 

VSCLOX/CDP.02 ; 


- ■ i 

transcriptional repressor CDP 


VSHOXT/MEISl HOXA9.01 i 


Homeobox protein MEIS1 binding site 


VSHOXF/HOXA9.0 1 ! 


Member of the vertebrate HOX - cluster of 
homeobox factors 


VSGATA/GATA.0 1 I 


GATA binding site (consensus) 


VSNKXH/NKX3 1.01 i 


prostate-specific homeodomain protein NKX3.1 


VSGATA/GATA3.02 j 


GATA-binding factor 3 


VSHOXF/CRX.0 1 ! 

_ j 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 


VSCART/CART1.01 j 


Cart-1 (cartilage homeoprotein 1) 
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^^^^^^^^^^^^^^^ 


VSOCT1/OCT1.02 


octamer-binding factor 1 




MYC-associated zinc linger protein related 
transcription factor 




"VSZBPF/ZBP89.0 1 


Zinc finger transcription factor ZBP-89 ; 


VSGATA/G ATA3 .02 


GATA-binding factor 3 


VSHOXF/CRX.0 1 


- — " " 1 — ~' *'""■—■ ■ — — ■■■■ — ■■ ■ — \ 

Cone-rod homeolx)x-containing transcription 

factor / otx-like homeobox gene 




V SCLOX/CDPCR3 .0 1 


cut-like homeodomain protein ! 


VSAP1FATMAF.01 


_ . ■» jT_a? 

v-Mai j 


T Tits A TT\ j| T> J*"J*i A T 4 AT T\T T A A **1 rt 4 ■ 

VSAP4R/TAL1 ALPHAE47.01 \ 


Tal-lalpha/E47 heterodimer ! 


V$PAXR/PAXR 01 


PAX 2/5/8 binding site 


VSBRAC/BRACH.01 


Brachyury 


VSGATA/GATA 1 .02 


GATA-binding factor 1 


VSRREB/RREB 1.01 


Ras-responsive element binding protein 1 


"VSMZF1/MZF1.01 


MZF1 " 


VSMOKF/MOK2.02 I 


Ribonucleoprotein associated zinc finger protein] 
MOK-2 (human) j 


i 


VSHOXF/PTX1.01 


Pituitary Homeobox 1 (Ptxl) 


VSLTUP/TAACC.Ol 


Lentiviral TATA upstream element 


V$ AP4R/TH 1 E47 0 1 \ 


Thing 1/E47 heterodimer, TH1 bHLH member 
specific expression in a variety of embryonic 
tissues 




i 


Se-Cys tRNA gene transcription activating 
factor 




VSIKRS/IK3.01 


Dcaros 3, potential regulator of lymphocyte 
differentiation 




VHAP1F/AP1 01 : 


API binding site 


VSMAZF/MAZ.01 ! 


— — — ^-^ — — — 1 — : — : — : j 

Myc associated zinc finger protein (MAZ) j 


VSMZF1/MZF1.01 « 


MZF1 ! 


VSCLOX/CDPCR3.01 j 


cut-like homeodomain protein 1 


VSP53F/P53.01 | 


tumor suppressor p5 3 j 


j 

V$SMAD/SiV[AD3 01 j 


Smad3 transcription factor involved in TGF- 
beta signaling 


* 1 


VSHMTB/MTBF.01 j 


muscle-specific Mt binding site 


VSOCT1/OCT1.03 j 


octamer-binding factor 1 


VSFKHD/XFD3 .0 1 1 


Xenopus fork head domain factor 3 


VSPITl/PITl.Ol j 


Pitl, GHF-1 pituitary specific pou domain 
transcription factor 




VSOCTP/OCT1P.01 


octamer-binding factor 1, POU-specific domain] 


VSHOXF/HOX1-3.01 


Hox-1 .3, vertebrate homeobox protein ■ 
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VSPBXF/PBX1.01 1 


iomeo domain factor Pbx-1 


VSECAT/NFY.03 i 


nuclear factor Y (Y -box binding factor) 


VSPBXC/PBXl MEIS1.02 ] 


Binding site for a Pbxl/Meisl heterodimer 


VSCLOX/CDP.02 


transcriptional repressor CDP . _ __j 


VSHOXTMEISl HOXA9.01 


Homeobox protein MEIS 1 binding site 


VSHOXF/HOXA9 01 


Member of the vertebrate HOX - cluster of 
lomeobox factors 


VSGATA/GATAl .02 


GAT A-binding factor 1 ; 


VSPCAT/ACAAT.01 ! 


Avian C-type LTR CCAAT box 


VSXSEC/STAF.01 


Se-Cys tRNA gene transcription activating 
factor 


VSOCTP/OCT1P.01 


octamer-binding factor 1, POU-specific domain 


VSCLOX/CDP.01 


cut-lilce homeodomain protein 


VSFAST/FAST1.01 


FAST-1 SMAD interacting protein 


VSECAT/NFY.Ol nuclear factor Y (Y-box binding factor) 


! VSMEF2/MMEF2.01 1 


myocyte enhancer factor j 


: VSTBPF/TATA.02 


n K . i • /~i j_ _ T 'I'll 'r A T A Vv^w 

Mammalian C-type L IK 1 A 1A box j 


VSFAST/FAST1.01 


FAST- 1 SMAD interacting protein 


VSLTUP/TAACC.01 


Lenti viral TATA upstream element | 


' VSMOKF/MOK2.01 ; 

i i 


Ribonucleoprotein associated zinc finger protein 
MOKL-2 (mouse) _ 


jV$BRNF/BRN2.01 


POU factor Bm-2 (N-Oct 3) 


VSHOXF/CRX.01 i 


Cone-rod homeobox-containing transcription i 
factor / otx-like homeobox gene i 


VSNKXH/NKX3 1 .0 1 


prostate-specific homeodomain protein NKX3.1 j 


VSHEN1/HEN1.01 \ 


HEN1 


•VSBEL1/BEL1.01 i 

! _ ...... i 


— :_: : : '— '■ — i 

Bel-1 similar region (defined in Lentivirus 
LTRs) 


iVSHOXF/PTXl.Ol , 


Pituitary Homeobox 1 (Ptxl) 


|V$BRNF/BRN2.01 


POU factor _Bm-2 (N-Oct 3) 


! VSNFKB/NFKAPPAB.0 1 


|NF-kappaB 


! VSHAML/AML1.01 


runt-factor AML- 1 


VSZFIA/ZID.01 


zinc finger with interaction domain 


VSXSEC/STAF.02 


Se-Cys tRNA gene transcription activating 
factor 


VMKRS/K1.01 


: Hcaros 1, potential regulator of lymphocyte 
differentiation 


;V<SFAST/FAST1.01 


; FAST-1 SMAD interacting protein 


'vSMOKF/MOK2.01 


Ribonucleoprotein associated zinc finger protein 
MOK:-2 (mouse) 
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^^^^^^^^^^^^^^^^ 


VSBEL1/BEL1.01 


Bel-1 similar region (defined in Lenti virus 
LTRs) 


VSEGRF/WTl.Ol 


Wilms Tumor Suppressor 


VSMAZF/MAZR.0 1 I 


MYC-associated zinc finger protein related j 
transcription factor { 


VSZBPF/ZBP89.0 1 


Zinc finger transcription factor ZBP-89 1 


VSZBPF/ZBP89.0 1 


Zinc finger transcription factor ZBP-89 


VSSP1F/GC.01 


GC box elements 


VSRREB/RREB 1.01 j 


Ras-responsive element binding protein 1 j 


VSMOKF/MOK2.01 


Ribonucleoprotein associated zinc finger protein! 
MOK-2 (mouse) 


VSMEIS/MEIS1.01 


Binding site for monomeric Meis 1 
homeodomain protein 


VSBCL6/BCL6.02 


POZ/zinc finger protein, transcriptional 
repressor, translocations observed in diffuse 
large cell lymphoma 


VSGATA/GATA3.02 . 


GATA-binding factor 3 j 


V$HOXF/CRX.01 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 


V$HOXF/CRX.0 1 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 


VSMAZF/MAZR.0 1 | 


MYC-associated zinc finger protein related 
transcription factor | 


VSMZF1/MZF1.01 


MZF1 j 


VSPDX1/PDX1.01 ! 


Pdx 1 (IDX1/EPF1 ) pancreatic and intestinal ; 
homeodomain TF 



**matches are listed in order of occurrence in the corresponding sequence 

TFBS in hluc+ver2B3 

After removal of TFBS from hluc+ver2B2 = before removal of TFBS 
5 from hluc+ver2B3 (35 matches) 







V$OCT1/OCT1.04 j 


octamer-binding factor 1 i 


VSBARB/B ARBIE.O 1 | 


barbiturate-inducible element j 


VSNFKB/NFKAPPAB.02 j 


NF-kappaB < 


VSOCTP/OCTIP.OI ! 


octamer-binding factor 1, POU-specific domain] 


VSPIT1/PIT1.01 j 


Pitl, GHF-1 pituitary specific pon domain J 
transcription factor \ 


VSHOXF/PTX1.01 


Pituitary Homeobox l ^bcl) _ j 


VSFKHD/FREAC4.01 1 


Fork head RElated ACtivator-4 ! 
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VSE4FF/E4F.01 


GLI-Krueppel-related transcription factor, 
regulator of adenovirus E4 promoter 


VSEVI1/EVI1.02 


Ecotropic viral integration site 1 encoded factor. 


VSGATA/GATA2.0 1 


GATA-binding factor 2 j 


VSGREF/PRE.01 ! 


Progesterone receptor binding site 


VSRBPF/RBPJK.O 1 


Mammalian transcriptional repressor RBP- I 
Jkappa/CBFl 


VSST AT/ST AT.01 ' 

j 


signal transducers and activators of j 
transcription 


» 

VSIKRS/1K2.01 


Ikaros 2, potential regulator of lymphocyte ! 

H i ffJpTPnti ati nn i 

Ul 11.^ it'll 11 a 1 11*1.1 | 


v y r jssiUf r ivc-/\v^ - w i 


Fork head RElated ACtivator-2 


v j>oxvrr/ox\x .u i 


qpnim re^nonse factor - f 


VClTPFF/PPF 01 ' 

V kPVJiNJZfF / JT JVC . V 1 


Progesterone recentor binding site ■ 


v'tr'T HY/rnppR^ 01 

, VJLLvA/LJJi V^Iw.VI 


c 1 1 t-l i Ve h omeodomain nrotein 


; V«CAPAP/TAT 1 AT PTTAF47 01 1 


Ta 1-1 a InVi a /R47 h eterodimer f 

' • " — " ~ 


: A/COAT A/^rATA 1 09 t 


frATA-hindina factor 1 


i \7CT71^TTn/V"E r n'* 01 i 
V a*! 4 JSJHU/ AJT JJ ^ . V 1 


V pnnnnQ finrk head domain factor 3 ^ 


VSPBXF/PBXl.Ol 


homeo domain factor Pbx-1 ' 


VSECAT/NFY.03 1 


nuclear factor Y (Y -box binding factor) ? 


;V$PBXC/PBX1 MEIS1.02 ! 


Binding site for a Pbxl/Meisl heterodimer 


|V$CLOX/CDP.02 j 


transcriptional repressor CDP _ | 


'•VSHOXT/MEISl HOXA9.01 i 


Homeobox protein MEIS 1 bmding site j 


1 . . ~ ^ 

;V$HOXF/HOXA9.01 

l 


Member of the vertebrate HOX - cluster of 1 

VifvmpfYhfYV factors ! 


- -^-^ £ 

VCr*ATA/frATA 1 09 ; 


GATA-bindine factor 1 ! 

: — — — — — — ~~ " 'i 


IVSMINI/MUSCLE INI.01 1 


Muscle Initiator Sequence ! 


jVSCLOX/CDP.Ol 1 


cut-like homeodomain protein _ ; 


|V$BRNF/BRN2.01 


POU factor Bm-2 jN-Ort 3) j 


i VSNFKB/NFKAPPAB.Ol 


NF-kappaB _ ^ _ j 


!V$ZFIA/ZID.01 


zinc finger with interaction domain ; 


!V$BCL6/BCL6.02 

■ 


POZ/zinc finger protein, transcriptional j 
repressor, translocations observed in diffuse ] 
large cell lymphoma ! 


:V$HOXF/CRX.01 


Cone-rod homeobox-containing transcription 
factor / otx-like homeobox gene 



**matches are listed in order of occurrence in the corresponding sequence 
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TFBS in hluc+ver2B6 

After removal of TFBS from hluc+ver2B5 (2 matches) 







|V$HOXF/PTXL01 


Pituitary Homeobox 1 (Ptxl) \ 


; | VSFKHD/XFD3 . 0 1 


Xenopus fork head domain factor 3 j 



**matches are listed in order of occurrence in the corresponding sequence 



TFBS in hluc+ver2B6 

Before removal of TFBS from hluc+ver2B6 (6 matches) 







VSPAX6/PAX4 PD.01 


PAX4 paired domain binding site 


VSHOXF/PTX1.01 


Pituitary Homeobox 1 (Ptxl) 


VSFKHD/XFD3.01 


Xenopus fork head domain factor 3 


VSPAX6/PAX6.02 

j 


PAX6 paired domain and homeodomain are required 
for binding to this site 


V$P AX5/P AX5 .03 ! 


P AX5 paired domain protein i 


VSIRFF/IRF3.01 


Interferon regulatory factor 3 (IRF-3) 



**matches are listed in order of occurrence in the corresponding sequence 

10 

TFBSinhluc+ver2B7 

After removal of TFBS from hluc+ver2B6 = before removal of TFBS 
from hluc+ver2B7 (2 matches) 







V$HOXFVPTX1.01 j 


Pituitary Homeobox 1 (Ptxl) ; 


VSFKHD/XFD3.0 1 1 


Xenopus fork head domain factor 3 



15 **matches are listed in order of occurrence in the corresponding sequence 

TFBS in hluc+ver2B8 

After removal of TFBS from hluc+ver2B7 = before removal of TFBS 
from hluc+ver2B8 (1 match) 

20 







VSFKHD/XFD3.01! 


Xenopus fork head domain factor 3 



TFBS inhluc+ver2B9 

After removal of TFBS from hluc+ver2B8 = before removal of TFBS 
from hluc+ver2B9 (1 match) 
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VSFKHD/XFD3.01 


Xenopus fork head domain factor 3 



TFBS in hluc+ver2B10 
5 After removal of TFBS from hluc+ver2B9 (1 match) 







VSFKHD/XFD3.0 1 


Xenopus fork head domain factor 3 ; 



Example 8 

Summary of Design for pGL4 Sequences 

1 0 Figure 2 depicts the design scheme for the pGL4 vector. A portion of the 

vector backbone in pGL3 which includes an bla gene and a sequence between 
bla and a multiple cloning region, but not a second open reading frame, was 
modified to yield pGIA pGL4 includes an ampicillin resistance gene between a 
Notl and a Spel site, the sequence of which was modified to remove regulatory 

15 sequences but not to optimize codons for mammalian expression (bla-l-bla-5), 
and a Spel-Ncol fragment that includes a multiple cloning region and a 
translation trap. The translation trap includes about 60 nucleotides having at 
least two stop codons in each reading frame. The Spel-Ncol fragment from a 
parent vector, pGL4-basics-5F2G-2, was modified to decrease undesired 

20 regulatory sequences (MCS-1 to MCS-4; SEQ ID Nos. 76-79). One of the 
resulting sequences, MCS-4, was combined with a modified ampicillin 
resistance gene, bla-5 (SEQ ID NO:84), to yield pGL4B-4NN (SEQ ID NO:95). 
pGL4B-4NN was further modified (pGL4-NNl-3; SEQ ID Nos. 96-98). To 
determine if additional polyA sequences in the Spel-Ncol fragment further 

25 reduced expression from the vector backbone, various polyA sequences were 
inserted therein. For instance, pGL4NN-Blue Heron included a c-mos polyA 
sequence in the Spel-Ncol fragment. However, removal of regulatory sequences 
in polyA sequences may alter the secondary structure and thus the function of 
those sequences. 



178 



WO 2006/034061 



PCT/US2005/033218 



In one vector, the Spel-Ncol fragment from pGL3 (Spel-Ncol start ver 2; 
SEQ ED NO:48) was modified to remove one transcription factor binding site 
and one restriction enzyme recognition site, and alter the multiple cloning region, 
yielding Spel-Ncol ver2 (SEQ ID NO:49). 

5 

TF binding sites and search parameters 

Each TF binding site ("matrix") belongs to a matrix family that groups 
functionally similar matrices together, eliminating redundant matches by 
Matlnspector professional (the search program). Searches were limited to 
10 vertebrate TF binding sites. Searches were performed by matrix family, i.e., the 
results show only the best match from a family for each site. Matlnspector 
default parameters were used for the core and matrix similarity values (core 
similarity = 0.75, matrix similarity = optimized), except for sequence MCS-1 
(core similarity = 1.00, matrix similarity = optimized). 

15 

Table 25 

Description of Designed Sequences 

pGL4 sequences 







P^.MatKxl|>! ? 
l'>':Eib'i^ty' M- 




Spel-Ncol fragment with MCS, 
translation trap 




MCS-1 


Spel-Ncol from pGL4-basics-5F2G-2 


Ver 2.2 Sep 
2001 


MCS-2 


First removal of undesired sequence 
matches 


Ver 2.2 Sep 
2001 


MCS-3 


• Second removal of undesired sequence 
matches 


Ver 2.2 Sep 
2001 


MCS-4 


Third removal of undesired sequence 
matches 


Ver 2.3 Feb 
2001 




Notl-Spel fragment with bla gene 




Bla 


Beta-lactamase gene from pGL3 vectors 




bla-1* 


SacR (RE) added, BsmAI (RE) site 
removed (*) 


Ver 2.2 Sep 
2001 


bla-2* 


First removal of undesired sequence 
matches 


Ver 2.3 Feb 
2001 


bla-3* 


Second removal of undesired sequence 
matches 


Ver 2.3 Feb 
2001 


bla-4* 


Third removal of undesired sequence 
matches 


Ver 2.3 Feb 
2001 
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: > ! •-^■^^^ description;;; ^?,.-v-v.'-,.; 




bla-5* 


Fourth removal of undesired sequence 
matches 


Ver2.3 Feb 




Notl-Ncol fragment with bla y 
translation trap, MC& 




pGL4B-4NN 


Combination of bla-5 and MCS-4 
sections 


Ver 2.4 May 

oaao 


pGL4B-4NNl 


First removal of undesired sequence 
matches 


Ver 2.4 May 
20U2 


pGL4B-4NN2 


Second removal of undesired sequence 
matches 


Ver 2.4 May 

OA AO 

2UU2 


pGL4B-4NN3 


Third version after removal of CEBP 
(It j site 


Ver 2.4 May 




Spel-Ncol fragment with translation 
trap, poly A, MCS 




Spel-Ncol- 
Ver2-start 


Existing MCS replaced with new MCS 


Ver 4.0 Nov 
2003 


Spel-Ncol-Verl 


First removal of undesired sequence 
matches 


Ver 4.0 Nov 
2003 



(*)Bla codon usage was not optimized for expression in mammalian 
cells. Low usage E. coli codons were avoided when changes were 
introduced to remove undesired sequence elements. 

5 

Table 26 

Sequences in Synthetic Spel-Ncol fragment of pGL4 
TFBS in MCS-1 

Before removal of TFBS from MCS-1 (14 matches) 







VSPAX3/PAX3 .01 


"" ' ' — — — — — — —[ 

Pax-3 paired domain protein, expressed in : 

embryogenesis, mutations correlate to j 

i 

Waardenburg Syndrome | 

. J 


VSGATA/GATA.01 


GATA binding site (consensus) j 

. _ .. _ j 


VSNKXH/NKX3 1.01 , 


prostate-specific homeodomain protein 
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NKX3.1 


VSCREB/E4BP4.01 


E4BP4, bZIP domain, transcriptional 

repressor 

.... 


VSBRN2/BRN2.01 ! 


POU factor Bm-2 (N-Oct 3) | 


VSCREB/E4BP4.01 

i 


E4BP4, bZIP domain, transcriptional 
repressor . 


i 

! 

V$NKXH/NKX31.01 : 


prostate-specific tiomeodomain protein 
NKX3.1 


VSZFIA/ZID.01 


zinc finger with interaction domain 


VSCP2F/CP2.01 j 

i 


CP2 

j 


VSBRAC/BRACH.01 


Brachyury 


VSPAX6/PAX6.01 ; 


Pax-6 paired domain protein ; 


■t 

V$NKXH/NKX31.0ll 


prostate-specific tiomeodomain protein 
NKX3.1 


VSTEAF/TEFl.Ol \ 


TEF-1 related mascle factor 


VSETSF/ELK1.02 j 

r 

,_ . ... _ | 


Elk-l 



**matches are listed in order of occurrence in the corresponding sequence 
TFBS in MCS-2 

After removal of TFBS from MCS-1 = before removal of TFBS from MCS-2 
5 (12 matches) 



.. CNamle 
j. family/mai 


?■ v. '> 
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VSGATA/GATA 01 


GATA binding site (consensus) 




VSNKXH/NKX3 1 .01 


prostate-specific homeodomain protein 
NKX3.1 




VSTBPF/ATATA.01 : 


Avian C-type LTR TATA box 




VSCART/CARTL01 


Cart-1 (cartilage homeoprotein 1) 




■■ i 

V$CREB/E4BP4.01 ! 


E4BP4, bZIP domain, transcriptional repressor! 




VSBRN2/BRN2 . 0 1 


POU factor Brn-2 (N-Oct 3) 




V$ CREB/E4BP4. 0 1 


E4BP4, bZD? domain, transcriptional repressor 




VCTRPF/ATATA 01 


Avian C-type LTR TATA box j 




VSNKXH/NKX3 1 .01 


prostate-specific homeodomain protein ' 
NKX3.1 \ 




V<KPAYfi/PAY6 01 f 
v u> x aau/ it nAU . \j i 


Pax-6 paired domain protein 


. I 


V$P AX8/PAX8 .0 1 i 


PAX 2/5/8 binding site J 


f 


i 

VSPAX1/PAX1.01 ; 


Paxl paired domain protein, expressed in the j 
developing vertebral column of mouse 
embryos 


i 

! 

. . 1 



**matches are listed in order of occurrence in the corresponding sequence 
TFBS inMCS-3 

After removal of TFBS from MCS-2 = before removal of TFBS from MCS-4 
5 (0 matches) 

TFBS in MCS-4 

After removal of TFBS from MCS-3 (0 matches) 
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Table 27 

Sequences in Synthetic Notl-Spel Fragment of pGL4 

TFBS in bla-1 



Before removal of TFBS from bla-1 (94 matches) 







VSGATA/GATAl .02 


GATA-binding factor 1 


V$HOXF/HOX1-3.01 


Hox-1.3, vertebrate homeobox protein 


VSTBPF/ATATA.01 


Avian C-type LTR TATA box 




VSETSF/NRF2.0 1 


nuclear respiratory factor 2 


V$OCTP/OCT1P.01 


octamer-binding factor 1, POU-specific 
domain 


V3ETSF/ELK1 .02 

i 


Elk-1 


V$GKLF/GKLF.01 | 


gut-enriched Krueppel-like factor 


VSE2FF/E2F.02 


E2F, involved in cell cycle regulation, 
interacts with Rb pi 07 protein 


VSETSF/NRF2 01 


mid pat rpQnirsitnrv fflptnr 0 


V$AP1F/VMAF 01 1 

. i 


v-Maf 


VSXBBF/RFXl 01 


X-box bindine nrotein RFX 1 


VSAREB/AREB6.04 

_ _ i 


AREB6 (Atplal regulatory element 
bindinj? factor ft\ 


VSCMYB/CMYB.01 


c-Myb, important in hematopoesis, 
cellular equivalent to avian 
myoblastosis virus oncogene v-myb 


VSVMYB/VMYB.02 ! 


v-Myb 


VSEBOX/NMYC.01 [ 


N-Myc 


VSVBPF/VBP.01 

| 


PAR-type chicken vitellogenin 
promoter-binding protein 


i 

VSCMYB/CMYB.01 

_ - i 


c-Myb, important in hematopoesis, ; 
cellular equivalent to avian 
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myoblastosis virus oncogene v-myb 


VSGATA/GATA3.02 


GATA-binding factor 3 


VSPAX8/PAX8.01 i 


PAX 2/5/8 binding site : 


VSHNF4/HNF4.02 


Hepatic nuclear factor 4 


VSE2FF/E2F.01 

1 

.J 


E2F, involved in cell cycle regulation, 
interacts with Rb pi 07 protein 


VSNFAT/NFAT.01 i 


Nuclear factor of activated T-cells ; 


VSECAT/NFY.02 


nuclear factor Y (Y -box binding factor) 


VSTBPF/TATA.02 


Mammalian C-type LTR TATA box 


VKMYT1/MYT1 02 

V lP-lVX X J. 1 / XVX X X X > \J£* 

j 


MyTl zinc finger transcription factor 
involved in primary neurogenesis 


V$GATA/GAT A3 .0 1 I 


GATA-binding factor 3 I 


VSCREB/CREB.02 


cAMP-responsive element binding 
protein 


t 

i 

VSWHZF/WHN.Ol ! 

i 


winged helix protein, involved in hair 
keratinization and thymus epithelium 
differentiation 


VSIRFF/lSRE.Ol 


interferon-stimulated response element 


VSNRSF/NRSE.Ol 


neural-restrictive-silencer-element 


V$TCFF/TCF11.01 i 

t 

r 


TCFll/KCR-Fl/Nrfl homodimers 


1 

VSSTAT/STAT.Ol ! 

i 

i 


signal transducers and activators of 
transcription 


V$EC AT/NFY. 03 

I 


nuclear factor Y (Y-box binding factor) 


VSOCT1/OCT1.05 | 


octamer-binding factor 1 


vsocTP/ocTip.oi ; 

i 


octamer-binding factor 1, POU-specific 
domain 


V$NKCH^O:25.02 i 


homeo domain factor Nkx-2.5/Csx, 
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^^^^^^^^^^^^^ 




tinman homolog low affinity sites 


v$pm/pm.oi 


Pitl, GHF-1 pituitary specific pou 
domain transcription factor 


VSCLOX/CDPCR3.01 i 


. '"- ' i 
cut-like homeodomain protem 


VSGREF/ARE.01 


Androgene receptor binding site 


VSGATA/GATAl .04 


GATA-binding factor 1 


VSE2TF/E2.02 ! 


papilloma virus regulator E2 


VSRPOA/POLYA.01 


Mammalian C-type LTR Poly A signal 


VSVMYB/VMYB.02 


v-Myb 


VSCEBP/CEBPB.01 


CCAAT/enhancer binding protein beta 


VSVBPF/VBP.01 ! 


PAR-type chicken vitellogenin 
promoter-binding protein 


VSCREB/HLF.01 


hepatic leukemia factor 


VSSF1F/SF1.01 


SF1 steroidogenic factor 1 


VSXBBF/MIF1.01 j 

1 


MIBP-1 / RFX1 complex 


VSIKRS/IK2.01 


Ikaros 2 S potential regulator of 
lymphocyte differentiation 


VSMINl/MUSCLE INI.02 i 


Muscle Initiator Sequence 






VSPCAT/CLTR CAAT.01 ! 


Mammalian C-type LTR CCAAT box 




VSPAX5/PAX5.01 f 

..1 


B-cell-specific activating protein 


i 

VSRPAD/PADS.01 i 


Mammalian C-type LTR Poly A 
downstream element 


VSXBBF/RFXl .02 


X-box binding protein RFX1 


VSCEBP/CEBPB.01 j 


CCAAT/enhancer binding protein beta 


VSCREB/HLR01 ! 

_ m _ i 


hepatic leukemia factor 


V$HNF1/HNF1.01 ! 

. . J 


hepatic nuclear factor 1 
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VSVMYB/VMYB.Ol 


v-Myb 


VSNKXH/NKX31 01 

i 

j 


prostate-specific homeodomain protein 
NKX3.1 


VSXBBF/RFX1.01 > 


X-box binding protein RFX1 


VSSTAT/STAT.Ol 


signal transducers and activators of 
transcription 


VSHNF1/HNF1.01 


hepatic nuclear factor 1 


VSHMYO/S8.01 


S8 


VSSORY/SOX5.01 


Sox-5 


V8RBTC7BRTGHT 01 


Bright, B cell regulator of IgH 
transcription 


_ ■-" 1 

VSNKXH/NKX25.02 ! 


homeo domain factor Nkx-2.5/Csx, 
tinman homolog low affinity sites 


VSGATA/GATAl .02 


GATA-binding factor 1 


VSBARB/BARBIE.Ol 


barbiturate-inducible element 


VSMTF1/MTF-1.01 


Metal transcription factor 1, MRE 


V$NFKB/CREL.01 


c-Rel 


VSETSF/ELK.1 .02 


Elk-1 


VSCLOX/CDP.Ol ! 


cut-like homeodomain protein 


VSRPOA/LPOLYA.01 


Lentiviral Poly A signal 


VSGATA/GATAl .03 i 

j 


GATA-binding factor 1 


VSZFIA/ZID.Ol 1 


zinc finger with interaction domain 




winged helix protein, involved in hair 


VSWHZF/WHN.Ol 

| 


keratinization and thymus epithelium 
differentiation 


"' J 

VSPAX1/PAX1.01 i 

i 

... - ^ i 


Paxl paired domain protein, expressed 
in the developing vertebral column of ' 
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mouse embryos 


VSGATA/LMO2COM.02 


complex of Lmo2 bound to Tal-1, E2A 
proteins, and GATA-1, half-site 2 




V$NRSF/NRSF.01 ; 


neuron-restrictive silencer factor 


V$AP4R/TAL1BETAE47.01 


Tal-lbeta/E47 heterodimer 




VSGATA/LMO2COM.02 ! 


complex of Lmo2 bound to Tal-1 , E2A 
proteins, and GAT A- 1 , half-site 2 


f 


VSGATA/GATA 1 . 02 


GATA-binding factor 1 


VSXBBF/RFX1.01 


X-box binding protein RFX1 


VSAHRR/AHRARNT.02 


aryl hydrocarbon / Amt heterodimers, 
fixed core 


i 


VSPAX5/PAX9.01 


zebrafish PAX9 binding sites 


V$CLOX/CDP.02 


transcriptional repressor CDP 


VSGATA/GATAl .0 1 \ 


GATA-binding factor 1 


V$AP1F/TCF11MAFG.01 i 


TCF1 1/MafG heterodimers, binding to 
subclass of API sites 




VSBRN2/BRN2.01 


POU factor Brn-2 (N-Oct 3) 


VSNKXH/NKX25.02 

r 

J 


homeo domain factor Nkx-2.5/Csx, 
tinman homolog low affinity sites 


VSECAT/NFY.02 ! 

. .... . . I 


nuclear factor Y (Y -box binding factor) 


VSFKHD/FREAC4.01 j 


Fork head RElated ACtivator-4 


VSNFAT/NFAT.Ol \ 


Nuclear factor of activated T-cells 1 


V$IRFF/IRF1.01 ! 

1 . ...... ..1 


interferon regulatory factor 1 


VSE2FF/E2R02 1 

j 


E2F, involved in cell cycle regulation, 
interacts with Rb p 1 07 protein 



**matches are listed in order of occurrence in the corresponding sequence 



TFBSinbla-2 
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After removal of TFBS from bla-1 = before removal of TFBS from bla-2 
= (51 matches) 







V^OAT A/HAT* Al 09 i 


frATA-hinHina fartnr 1 


v urn x or/iNivr ^ . \J i 


Tin/*1 AOT* TAClMf*CllYYl"l7 raPtfM* 9 


V$OCTP/OCT1P.01 

i 


f\r*tckvn AT-V^inHTncr -por»f r\»- 1 POT T— crw*irif* 
Ul^UllIlCI-UiliU.lIlg JLaVlUI 1, IT WU oUwJJiv 

domain i 


VSETSF/ELKl .02 i 


Elk-1 | 


V 4>HyJjW.AyiNlVl I 


in ~ivi y u 


V^nATA/HAIT A*k 09 


frAT A-hinHincr fartnr ^ 


V<CPAY8/PAYS HI 
V^r/\AO/r/\Ao.Ul 


PAV 9/^/8 hinrlino citf* 


. . . . J 


nepduc iiuc/iedr iauior *+ 


) 

VSE2FF/E2F.0 1 ! 


E2F, involved in cell cycle regulation, 
lnieractb wiui ssxj piu / protein j 


V<£>JF A T/NTP A T 0 1 


IN UL*lCal laL/LUi Ul aOLlvcLLCU | 


V$EC AT/NF Y . 02 j 


nuclear factor Y (Y-box binding factor) j 


VSTBPF/TATA.02 ! 


Mammalian C-type LTR TATA box j 


i 

VSMYTl /MYT 1 .02 ! 


MyTl zinc finger transcription factor j 
involved in primary neurogenesis j 


VSGATA/GAT A3 .0 1 • 


GATA-binding factor 3 j 


VSCREB/CREB.02 j 

f 

> 


cAMP-responsive element binding 
protein 


VSWHZF/WHN.Ol 


winged helix protein, involved in hair 
keratinization and thymus epithelium 
differentiation 


VSNRSF/NRSE.01 


neural-restrictive-silencer-element j 


VSOCT1/OCT1.05 


octamer-binding factor 1 


VSCLOX/CDPCR3.01 


cut-like homeodomain protein 

...... * 
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UCPD'Dr/A'DD A1 

: i 


Androgene receptor binding site 


' VSGATA/GATAl .04 : 


GATA-binding factor 1 


jVSCEBP/CEBPB.Ol 


CCAAT/enhancer binding protein beta 


;V$CREB/HLF.01 I 


hepatic leukemia factor ; 


VSVBPF/VBP.01 


PAR-type chicken vitellogenin 
promoter-binding protein 


!v$XBBF/MIF1.01 j 


MD3P-1 / RFX1 complex j 


VSKRS/IK2.01 : 


Ikaros 2, potential regulator of 
lymphocyte differentiation 


VSPAX5/PAX5.01 


B-cell-specific activating protein 


VSXBBF/RFXl .02 


X-box bmdmg protein RFX1 j 


V$CEBP/CEBPB . 0 1 


CCAAT/enhancer binding protem beta ! 


VSCREB/HLF.01 


hepatic leukemia factor 


VSXBBF/RFXl .02 


X-b ox bmdmg protem RFX 1 j 


VSGATA/GATAl .02 


GATA-binding factor 1 


VSBARB/BARBIE.Ol I 

- ! 


barbiturate-mducible element [ 


V$MTFl/MTF-l.0l 

. j 


Metal transcnption factor 1 , MRE j 

.... j 


VSNFKB/CREL.Ol i 


c-Rel 


VSETSF/ELKl . 02 


Elk-1 | 


VSTBPF/TATA.0 1 


cellular and viral TATA box elements 1 


VSMEIS/MEIS1.01 


Horneobox protein MEIS1 binding site j 


VSHOXF/HOXA9.01 | 

i 

..J 


Member of the vertebrate HOX - cluster 
of horneobox factors 


VSGATA/GATAl .03 ! 

j 


GATA-binding factor 1 


VSMEIS/MEIS1.01 


Horneobox protein MEIS1 binding site • 


VSNOLF/OLF1.01 


olfactory neuron-specific factor 
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V$AP4R/TAL1BETAE47.01 : 


Tal-lbeta/E47 heterodirner 




VSGATA/GATAl .02 


GATA-binding factor 1 


VSXBBF/RFX1.01 i 


X-box binding protein RFX1 j 


i 

V$AHRR/AHRARNT.02 ' 


aryl hydrocarbon / Arnt heterodimers, ! 
fixed core 




VSPAX5/PAX9.01 i 

... 1 


zebrafish PAX9 binding sites \ 


VSCLOX/CDP.02 1 


transcriptional repressor CDP ! 


VSGATA/GATAl .01 


GATA-binding factor 1 


VSIRFF/IRF1.01 


interferon regulatory factor 1 


VSE2FF/E2F.02 


E2F, involved in cell cycle regulation, 
interacts with Rb pl07 protein j 



**matches are listed in order of occurrence in the corresponding sequence 
TFBS inbla-3 

After removal of TFBS from bla-2 = before removal of TFBS from bla-3 
5 = (16 matches) 







— — '— — 1 " : 1 

VSETSF/NRF2.01 i 


i 

nuclear respiratory factor 2 j 


VSE2FF/E2F.02 ! 

i 

\ 
? 


E2F, involved in cell cycle regulation, interacts with 
Rb pi 07 protein 


VSNFAT/NFAT.0 1 | 

i 


Nuclear factor of activated T-cells 


V$TBPF/TATA.02 ! 


Mammalian C-type LTR TATA box 


VSMYT1/MYT1 .02 ! 

i 

i 


MyTl zinc finger transcription factor involved in 
primary neurogenesis 

- .._ _ i 



190 



WO 2006/034061 



PCT/US2005/033218 



iP£ - W-S^^ Mil 
y>^^^I^^^^^^^P^ ^^^^^ 




VSWHZF/WHN.01 


winged helix protein, involved in hair keratinization 
and thymus epithelium differentiation 


V$SORY/SOX5.01 


Sox-5 


VSCEBP/CEBPB.01 


! 

CCAAT/enhancer binding protein beta 


VSCREB/HLF.01 


hepatic leukemia factor 


VSVBPF/VBP.01 


PAR-type chicken vitellogenin promoter-binding 
protein 


V$P AX5/PAX5 .0 1 | 

.1 


B-cell-specific activating protein * E 


VSXBBF/RFXl .02 • 


X-box binding protein RFX1 


VSCREB/HLF.01 


hepatic leukemia factor 


VSGATA/GATAl .0 1 
3 


GATA-binding factor 1 j 

* 


VSMEIS/MEIS1.01 ; 


Homeobox protein MEIS1 binding site 


VSNOLF/OLF1.01 I 

_ _ i 


olfactory neuron-specific factor ! 

j 



**matches are listed in order of occurrence in the corresponding sequence 
TFBSinbla-4 

After removal of TFBS from bla-3 = before removal of TFBS from bla-4 
5 = (14 matches) 
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WHIP 


^^^^^^^ 


V$ETSF/NRF2.01 


nuclear respiratory factor 2 

1 


VSNFAT/NFAT 01 


Nuclear factor of activated T-cells ' 


VSWHZF/WHN.01 j 

i 

i 


winched TirIiv nrotpin involved in hair 

keratinization and thymus epithelium 
differentiation 


VSGAT A/GAT A3 . 0 1 


GATA-binding factor 3 


VSCEBP/CEBPB.Ol 

i 


CCAAT/enhancer binding protein beta 


VSEBOX/USF.02 

| 


upstream stimulating factor 

. . 


V$PAX5/PAX5.01 


B-cell-specific activating protein 


VSXBBF/RFX1.02 

1 


X-box binding protein RFX1 

„ ... 


1 

VSGATA/GATA 1 .03 j 


GATA-binding factor 1 


VSMEIS/MEIS1.01 i 


Homeobox protein MEIS1 binding site 

.... J 


VKZFIA/ZID 01 i 

i 


7\x\c finapr witn vntprartinn domain 

£j1±1\s LUAq^sI WILLI liilvlaviivll U-V/XllCllXl 


VSWHZF/WHN.01 ! 

i 
\ 


— " 

winged helix protein, involved in hair 
keratinization and thymus epithelium 

d i fTprpnti a ti nn 


i 

j 
1 

VSPAX1/PAX1.01 ! 

i 


Paxl paired domain protein, expressed in 
the developing vertebral column of mouse 
embryos 


V$GATA/LMO2COM.02l 


complex of Lmo2 bound to Tal-1, B2A 

i 


i 
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proteins, and GATA-1, half-site 2 

. _ j 



**matches are listed in order of occurrence in the corresponding sequence 

TFBSinbla-5 

After removal of TFBS from bla-4 (5 matches) 







VSETSF/NRF2.01 


nuclear respiratory factor 2 


t 

i 

V$WHZF/WHN.01 ; 

j 


winged helix protein, involved in hair 
keratinization and thymus epithelium j 
differentiation 


VSGATA/GATA3 .01 i 

i 

...... J 


GATA-binding factor 3 ! 


1 

VSCEBP/CEBPB.01 


f 

CCAAT/enhancer binding protein beta | 

i 


VSEBOX/USF.02 


i 

upstream stimulating factor 

i 

_ ^ .... • 



5 **matches are listed in order of occurrence in the corresponding sequence 

Table 28 

Sequences in Synthetic Notl-Ncol Fragment of pGL4 
TFBS in pGL4B-4NN 
10 Before removal of TFBS from pGL4B-4NN = (1 1 matches) 



''■iTi^y^matrixf 1 ^ 1 ■ j 




iVSSMAD/FASTl.Ol j 


i 

FAST-1 SMAD interacting protein 


! . . i 


VSSMAD/FASTl.Ol j 


FAST-1 SMAD interacting protein j 


1 ! 
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VSETSF/FT T01 


ETS family member FLI 




VSRBPF/RBPJK.01 


Mammalian transcriptional repressor RBP- 
Jkappa/CBFl 




VSETSF/FLL01 


ETS family member FLI j 




VSEBOX/USF.02 


upstream stimulating factor 

.... i 






CCAAT/enhancer binding protein beta j 

! 




VSGATA/GATA3.01 


GATA-binding factor 3 




v$wtt7r/wttm ni 

V 4> W nZjF/ W XXI N .U 1 


winged helix protein, involved in hair 
keratinization and thymus epithelium i 
differentiation ; 


1 


VSETSF/NRF2.01 


nuclear respiratory factor 2 




VSTBPF/ATATA.01 ; 


Avian C-type LTR TATA box ! 


... :...] 



**matches are listed in order of occurrence in the corresponding sequence 

TFBS in pGL4B-4NN1 

After removal of TFBS from pGL4B-4NN = before removal of TFBS 
5 from pGL4B-4NNl (7 matches) 







VSETSF/NRF2 . 0 1 ! 

i 


nuclear respiratory factor 2 j 


i 
1 

VSWHZF/WHN.01 : 

i 


winged helix protein, involved in hair 
keratinization and thymus epithelium j 
differentiation 


VSCEBP/CEBPB.01 i 


CCAAT/enhancer binding protein beta ! 
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VSEBOX/USR02 


upstream stimulating factor 




VSETSF/FLI.Ol 


ETS family member FLI 




VSSMAD/FASTl.Ol 


FAST- 1 SMAD interacting protein | 

i 




VSSMAD/FASTl.Ol 


FAST-1 SMAD interacting protein 





**matches are listed in order of occurrence in the corresponding sequence 

TFBS in pGL4B-4NN2 

After removal of TFBS from pGL4B-4NNl = before removal of TFBS 
5 from pGL4B-4NN2 (4 matches) 











VSETSF/NRF2.01 : 


nuclear respiratory factor 2 \ 




VSWHZF/WHN.01 i 


winged helix protein, involved in hair 
keratinization and thymus epithelium j 

differentiation ! 

, i 


i 

! 


VSCEBP/CEBPB.Oli 


CCAAT/enhancer binding protein beta 




V$EBOX/USR02 ; 


upstream stimulating factor 

i 



**matches are listed in order of occurrence in the corresponding sequence 
TFBS in PGL4B-4NN3 



After removal of TFBS from pGL4B-4NN2 (3 matches) 



t fainily/matrixr 


^ ;i :.■;! •;' J, . ^ . t ,; v.v w ;* : ,. ? ; * SMA^ • i 


VSEBOX/USF. 

l 

- . - - i 


upstream stimulating factor i 
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02 




VSWHZF/WH 
N.01 


winged helix protein, involved in hair keratinization and 
thymus epithelium differentiation 


V$ETSF/NRF2| 
,01 


nuclear respiratory factor 2 



**matches are listed in order of occurrence in the corresponding sequence 

Table 29 

Sequences in Synthetic Spel-Ncol section of pGL4 
5 TFBS in &7eI-AfcoI-Ver2-start 



Before removal of TFBS from Spel-Ncol-V er2-stait (34 matches) 







VSPAX8/PAX8.01 


PAX 2/5/8 binding site 


VSGATA/GATAl .02 


GATA-binding factor 1 


— — : ;., _- : ; : j 

VSCREB/E4BP4.01 1 


E4BP4, bZIP domain, transcriptional 
repressor 


i 

VSNKXH/NKX31.01 


Prostate-specific homeodomain protein 
NKX3.1 


VSTBPF/ATATA.0 1 


Avian C-type LTR TATA box 


VSCREB/E4BP4.01 ! 

| 


E4BP4, bZIP domain, transcriptional 
repressor 


VSNKXH/NKX31.01 


Prostate-specific homeodomain protein . 
NKX3.1 


VSCART/CARTL01 i 


Cart-1 (cartilage homeoprotein 1) 


VSNKXH/NKX25.02 i 

i 
1 


Homeo domain factor Nkx-2.5/Csx, 
tinman homolog low affinity sites 


VSETSF/ELK1.01 

. . . . 1 


Elk-1 
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VSCDXF/CDX2.0 1 

.... - 


Cdx-2 mammalian caudal related 
intestinal transcr. factor i 


VSBRNF/BRN3 .0 1 i 


POU transcription factor Brn-3 


VSTBPF/TATA.02 


Mammalian C-type LTR TATA box 


VSFKHD/FREAC3.01 


Fork head related activator-3 (FOXC1) 


VSOCT1/OCT1.02 ; 


Octamer-binding factor 1 


VSCART/CARTl.Ol 


Cart-1 (cartilage homeoprotein 1) 


VSPDX1/PDX1.01 


Pdxl (IDXl/EPFl) pancreatic and 
intestinal homeodomain TF 


VSP.ARF/DBP.01 i 

. - _ .... _ j 


Albumin D-box binding protein 


VSGATA/GATA3 . 02 i 


GATA-binding factor 3 


VSVBPF/VBP.01 ' 


PAR-type chicken vitellogenin 
promoter-binding protein ! 


VSAJ4R/TAL1 ALPHAE47.01 j 


Tal-lalpha/E47 heterodimer 




VSRJP58/RP58.01 i 


Zinc finger protein RP58 (ZNF238), 
associated preferentially with 

heterochromatin j 

_ _ i 


VSCOMP/COMPl.Ol 


COMP 1 , cooperates with myogenic j 
proteins in multicomponent complex 


VSCLOX/CLOX.0 1 

.. t 


Clox 


VSTBPF/ATATA.01 1 

. . ( 


Avian C-type LTR TATA box ! 


i 


Binding site for a Pbx 1/Meis 1 j 
heterodimer j 


VSPBXC/PBXl MEIS1.02 j 


VSPBXF/PBX1.01 


Homeo domain factor Pbx-1 : 


VSIRFF/IRF1.01 i 


Interferon regulatory factor 1 ! 


VSTEAF/TEFl.Ol 1 


TEF-1 related muscle factor ! 

i 
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VSEBOX/ATF6.01 


Member of b-zip family, induced by ER 
damage/stress, binds to the ERSE in 
association with NF-Y 


VSNBCXH/NKX32 01 


Homeodomain protein NKX3.2 
(BAPX1, NKX3B, Bagpipe homolog) 


VSE2TF/E2.02 


Papilloma virus regulator E2 


VSEVI1/EVI1.05 


Ecotropic viral integration site 1 
encoded factor 


VSGAT A/GAT A3 . 02 


GATA-binding factor 3 



**matches are listed in order of occurrence in the corresponding sequence 



TFBS in SveI~Ncol-Ver2 



After removal of TFBS from ^eI-M:oI-Ver2-start (28 matches) 







V$PAX8/P AX8.0 1 


PAX 2/5/8 binding site \ 


VSGATA/GATAl .02 ! 


GATA-binding factor 1 \ 

. . . . 1 


VSCREB/E4BP4.01 ; 


E4BP4, bZIP domain, transcriptional ; 
repressor ' 


V$NKXEi/NKX31.01 1 

J 


Prostate-specific homeodomain protein | 

! 

NKX3.1 j 


VSTBPF/ATATA.01 : 


Avian C-type LTR TATA box j 


VSCREB/E4BP4.0 1 ! 


E4BP4, bZIP domain, transcriptional j 

i 

repressor ! 

i 


VSNKXH/NKX3 1 .01 


Prostate-specific homeodomain protein : 
NKX3.1 


VSCART/CARTl.Ol | 


Cart-1 (cartilage homeoprotein 1) j 
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VSNKXH/NKX25 .02 


Homeo domain factor Nkx-2.5/Csx, 
tinman horxiolog low affinity sites 


VSCDXF/CDX2.01 ' 


Cdx-2 mammalian caudal related i 
intestinal transcr. factor 


VSBRNF/BRN3 .01 t 

= . . . . i 


POU transcription factor Brn-3 


1 V$TBPF/TATA.02 

i i 


Mammalian C-type LTR TATA box 


VSFKHD/FREAC3.01 


Fork head related activator-3 (FOXC1) 


VSOCT1/OCT1.02 


Octamer-binding factor 1 


VSCART/CART1.01 


Cart-1 (cartilage homeoprotein 1) 


VSPDX1/PDX1.01 | 

1 
i 


Pdxl (IDX 1/IPF1) pancreatic and ! 
intestinal homeodomain TF s 


VSPARF/DBP.Ol 


Albumin D-box binding protein 


VSGATA/GATA3 .02 

i 


GATA-binding factor 3 


I 

VSVBPF/VBP.01 | 

! 
i 


PAR-type chicken vitellogenin ; 
promoter-binding protein j 


V$AP4R/TAL1 ALPHAE47.0 1 


Tal-lalpha/E47 heterodimer ! 




t 

VSRP58/RP58.01 j 


Zinc finger protein RP58 (ZNF238), ! 
associated preferentially with 
heterochrornatin 


VSCOMP/COMP1.01 


COMP1, cooperates with myogenic 
proteins in multicomponent complex 


VSCLOX/CLOX.0 1 


Clox j 


VSTBPF/ATATA.01 i 


Avian C-type LTR TATA box j 


VSPBXC/PBXl MEIS1.02 \ 


Binding site for a Pbx 1/Meis 1 
heterodimer [ 




VSPBXF/PBXl.Ol j 


Homeo domain factor Pbx- 1 j 
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VSIRFF/IRF1.01 


Interferon regulatory factor 1 




VSTEAF/TEFl.Ol 


TEF-1 related muscle factor 





"""matches are listed in order of occurrence in the corresponding sequence 



The number of consensus transcription factor binding sites present in the 
vector backbone (including the ampicillin resistance gene) was reduced from 224 
5 in pGL3 to 40 in pGL4, and the number of promoter modules was reduced from 
10 in pGL3 to 4 for pGL4, using databases, search programs and the like as 
described herein. Other modifications in pGL4 relative to pGL3 include the 
removal of the fl origin of replication and the redesign of the multiple cloning 
region. 



MCS-1 to MCS-4 have the following sequences (SEQ ID Nos:76-79) 
MCS-1 

ACTAGTCGTCTCTCTTGAGAGACCGCGATCGCCACCATGATAAGTAA 
1 5 GTAATATTAAATAAGTAAGGCCTGAGTGGCCCTCGAGCC A.GCCTTGA 
GTTGGTTGAGTCCAAGTCACGTCTGGAGATCTGGTACCTACGCGTGA 
GCTCTACGTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGATCTAAG 
TAAGCTTGGCATTCCGGTACTGTTGGTAAAGCCACCATGG 

20 MCS-2 

ACTAGTACGTCTCTCTTGAGAGACCGCGATCGCCACCATGATAAGTA 
AGTAATATTAAATAAGTAAGGCCTGAGTGGCCCTCGAGTCCAGCCTT 
GAGTTGGTTGAGTCCAAGTCACGTCTGGAGATCTGGTACCTTACGCGT 
AGAGCTCTACGTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGATCT 
25 AAGCTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG 

MCS-3 

ACTAGTACGTCTCTCTTGAGAGACCGCGATCGCATGCCTAGGTAGGT 
AGTATTAGAGCATAGGTAGAGGCCTAAGTGGCCCTCGAGrCCAGCCT 
30 TGAGTTGGTTGAGTCCAAGTCACGTCTGGAGATCTGGTACCTTACGCG 
TATGAGCTCTACGTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGAT 
CTAAGCTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG 

MCS-4 

35 ACTAGTACGTCTCTCTTGAGAGACCGCGATCGCCACCATGTCTAGGT 
AGGTAGTAAACGAAAGGGCTTAAAGGCCTAAGTGGCCCTCGAGTCCA 
GCCnTGAGTTGGTTGAGTCCAAGTCACGTTTGGAGATCTGGTACCTTA 
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CGCGTATGAGCTCTACGTAGCTAGCGGCCTCGGCGGCCGAATTCTTG 
CGATCTAAGCTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG 

bla has the following sequence: 
5 ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCAT 

TTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAG 

ATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGAT 

CTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTT 

TCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATC 

1 0 CCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATT 
CrCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTT 
ACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCAT 
GAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGAC 
CGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACT 

1 5 CGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGA 
CGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCA 
AACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAA 
TAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCG 
GCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAG 

20 CGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCC 
CTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGG 
ATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAG 
CATTGGTAA (SEQ ID NO:41) . 



25 bla-1 to bla-5 have the following sequences (SEQ ID Nos:80-84): 
bla-1 

ACTAGTAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGT 
ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCAT 

30 TTTGCCnTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAG 
ATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGAT 
CTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTT 
tCCAATGATGAGCACTrTTAAAGTTCTGCTATGTGGCGCGGTATTATC 
CCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATT 

35 CTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTT 
ACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCAT 
GAGTGATAACACCGCGGCCAACTTACTTCTGACAACGATCGGAGGAC 
CGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACT 
CGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGA 

40 CGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCA 
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AACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAA 
TAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCG 
GCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAG 
CGTGGCTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCC 
5 CTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGG 
ATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAG 
CATTGGTAACCACTGCAGTGGTTTTCCTTTTGCGGCCGC 

bla-2 

1 0 ACTAGTAACCCTGATAAATGCTGCAAACATATTGAAAAAGGAAGAGT 
ATGAGTATTCAACATTTCCGTGTCGCACTCATTCCCTTCTTTGCGGCA 
TTTTGCTTGCCTGTTTTTGCACACCCCGAAACGCTGGTGAAAGTAAAA 
GATGCTGAAGATCAACTGGGTGCACGAGTGGGCTATATCGAACTGGA 
TCTCAATAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTT 

1 5 TCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATC 
CCGTATTGACGCCGGGCAAGAGCAGCTCGGTCGCCGCATACACTACT 
CACAGAACGACTTGGTTGAGTACTCGCCGGTCACGGAAAAGCATCTT 
ACGGATGGCATGACAGTAAGAGAATTGTGTAGTGCTGCCATAACCAT 
GAGTGATAACACCGCGGCCAACTTACTTCTGACAACGATCGGAGGCC 

20 CTAAGGAGCTGACCGCATTTTTGCACAACATGGGGGATCATGTAACC 
CGGCTTGATCGTTGGGAACCGGAGCTGAACGAAGCCATACCGAACGA 
CGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCA 
AACTACTCACTGGCGAACTTCTCACTCTAGCATCACGACAGCAACTC 
ATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTC 

25 GGCCCTTCCGGCTGGCTGGTTTATAGCTGATAAATCCGGTGCCGGTG 
AACGCGGCTCTCGCGGGATCATTGCTGCGCTGGGGCCAGATGGTAAG 
CCCTCACGAATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTAT 
GGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATCA 
AGCACTGGTAGCCACTGCAGTGGTTTAGCTTTTGCGGCCGC 

30 

bla-3 

ACTAGTAACCCTGACAAATGCTGCAAACATATTGAAAAAGGAAGAGT 

ATGAGCATCCAACATTTTCGTGTCGCACTCATTCCCTTCTTTGCGGCA 

TTTTGCTTGCCTGTTTTTGCACACCCCGAAACGCTGGTGAAAGTAAAA 

35 GATGCTGAAGATCAACTGGGTGCAAGAGTGGGCTATATCGAACTGGA 
TCTCAATAGCGGCAAGATCCTTGAGTCTTTTCGCCCCGAAGAACGTTT 
TCCGATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTGTTGTC 
CCGTATAGACGCCGGGCAAGAGCAGCTTGGTCGCCGTATACACTACT 
CACAAAACGACTTGGTTGAGTACTCGCCGGTCACGGAAAAGCATCTT 

40 ACGGATGGCATGACGGTAAGAGAATTGTGTAGTGCTGCCATTACCAT 
GAGCGACAATACCGCGGCCAACTTACTTCTGACAACGATCGGAGGCC 
CTAAGGAGCTGACCGCATTTTTGCACAACATGGGGGATCATGTAACC 
CGGCTTGACCGCTGGGAACCGGAGCTGAACGAAGCCATACCGAACG 
ACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGG 

45 AAACTACTCACTGGCGAACTTCTCACTCTAGCATCACGACAGCAGCT 
CATAGACTGGATGGAGGCGGACAAAGTAGCAGGACCACTTCTTCGCT 
CGGCCCTCCCTGCTGGCTGGTTCATTGCTGATAAATCCGGTGCCGGTG 
AACGCGGCTCTCGCGGGATCATTGCTGCGCTGGGGCCTGATGGTAAG 
CCCTCACGAATCGTAGTAATCTACACGACGGGGAGTCAGGCCACTAT 
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GGACGAACGAAATAGACAGATCGCTGAGATCGGTGCCTCACTGATCA 
AGCACTGGTAACCACTGCAGTGGTTTAGCATTTGCGGCCGC 

bla-4 

5 ACTAGTAACCCTGACAAATGCTGCAAACATATTGAAAAAGGAAGAGT 
ATGAGCATCCAACATTTTCGTGTCGCACTCATTCCCTTCTTTGCGGCA 
TTTTGCTTGCCTGTTTTTGCACACCCCGAAACGCTGGTGAAAGTAAAA 
GATGCTGAAGATCAACTGGGTGCAAGAGTGGGCTATATCGAACTGGA 
TCTCAATAGCGGCAAGATCCTTGAGTCTTTCCGCCCCGAAGAACGTTT 

1 0 TCCGATGATGAGCACTTTC AAAGTACTGCTATGTGGCGCGGTGTTGTC 
CCGTATAGACGCCGGGCAAGAGCAGCTTGGTCGCCGTATACACTACT 
CACAAAACGACTTGGTTGAGTACTCGCCGGTCACGGAAAAGCATCTT 
ACGGATGGCATGACGGTAAGAGAATTGTGTAGTGCTGCCATTACCAT 
GAGCGATAATACCGCGGCCAACTTACTTCTGACAACGATCGGAGGCC 

1 5 CTAAGGAGCTGACCGCATTTTTGCACAACATGGGTGATCATGTGACC 
CGGCTTGACCGCTGGGAACCGGAGCTGAACGAAGCCATACCGAACG 
ACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACTCTTCGG 
AAACTACTCACTGGCGAACTTCTCACTCTAGCATCACGACAGCAGCT 
CATAGACTGGATGGAGGCGGACAAAGTAGCAGGACCACTTCTTCGCT 

20 CGGCCCTCCCTGCTGGCTGGTTCATTGCTGATAAATCTGGAGCCGGTG 
AGCGTGGCTCTCGCGGTATCATTGCTGCGCTGGGGCCTGATGGTAAG 
CCCTCACGAATCGTAGTAATCTACACGACGGGGAGTCAGGCCACTAT 
GGACGAACGAAATAGACAGATCGCTGAGATCGGTGCCTCACTGATCA 
AGCACTGGTAACCACTGCAGTGGTTTAGCATTTGCGGCCGC 

25 

bla-5 

ACTAGTAACCCTGACAAATGCTGCAAACATATTGAAAAAGGAAGAGT 

ATGAGCATCCAACATTTTCGTGTCGCACTCATTCCCTTCTTTGCGGCA 

TTTTGCTTGCCTGTTTTTGCACACCCCGAAACGCTGGTGAAAGTAAAA 

30 GATGCTGAAGATCAACTGGGTGCAAGAGTGGGCTATATCGAACTGGA 
TCTCAATAGCGGCAAGATCCTTGAGTCTTTCCGCCCCGAAGAACGAT 
TCCCGATGATGAGCACTTTCAAAGTACTGCTATGTGGCGCGGTGTTGT 
CCCGTATAGACGCCGGGCAAGAGCAGCTTGGTCGCCGTATACACTAC 
TCACAAAACGACTTGGTTGAGTACTCGCCGGTCACGGAAAAGCATCT 

35 TACGGATGGCATGACGGTAAGAGAATTGTGTAGTGCTGCCATTACCA 
TGAGCGATAATACCGCGGCCAACTTACTTCTGACAACGATCGGAGGC 
CCTAAGGAGCTGACCGCATTTTTGCACAACATGGGTGATCATGTGAC 
CCGGCTTGACCGCTGGGAACCGGAGCTGAACGAAGCCATACCGAAC 
GACGAGCGTGATACCACGATGCCAGTAGCAATGGCCACAACTCTTCG 

40 GAAACTACTCACTGGCGAACTTCTCACTCTAGCATCACGACAGCAGC 
TCATAGACTGGATGGAGGCGGACAAAGTAGCAGGACCACTTCTTCGC 
TCGGCCCTCCCTGCTGGCTGGTTCATTGCTGACAAATCCGGTGCCGGT 
GAACGCGGCTCTCGCGGCATCATTGCTGCGCTGGGGCCTGATGGTAA 
GCCCTCACGAATCGTAGTAATCTACACGACGGGGAGTCAGGCCACTA 

45 TGGACGAACGAAATAGACAGATCGCTGAGATCGGTGCCTCACTGATC 
AAGCACTGGTAACCACTGCAGTGGTTTAGCATTTGCGGCCGCNNN. 

Table 30 
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Pairwise identity of different bla gene versions 
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note: sequence "bla" is bla gene from pGL3-Basic; ClustalW 



(Slow/Accurate, IUB); sequence comparisons were of ORF only 

5 

Spel-Ncol ver2 start has the following sequence: 

ACTAGTACGTCTCTCAAGGATAAGTAAGTAATATTAAGGTACGGGAG 
GTACTTGGAGCGGCCGCAATAAAATATCTTTATTTTCATTACATCTGT 
GTGTTGGTTTTTTGTGTGAATCGATAGTACTAACATACGCTCTCCATC 
10 AAAACAAAACGAAACAAAACAAACTAGCAAAATAGGCTGTCCCCAG 
TGCAAGTGCAGGTGCCAGAACATTTCTCTGGCCTAAGTGGCCGGTAC 
CGAGCTCGCTAGCCTCGAGGATATCAGATCTGGCCTCGGCGGCCAAG 
CTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG (SEQ ID NO:48); 
and 

15 

SpeI-NcoI-Ver2 has the following sequence: 

ACTAGTACGTCTCTCAAGGATAAGTAAGTAATATTAAGGTACGGGAG 
GTATTGGACAGGCCGCAATAAAATATCTTTATTTTCATTACATCTGTG 
TGTTGGTTTTTTGTGTGAATCGATAGTACTAACATACGCTCTCCATCA 
20 AAACAAAACGAAACAAAACAAACTAGCAAAATAGGCTGTCCCCAGT 
GCAAGTGCAGGTGCCAGAACATTTCTCTGGCCTAACTGGCCGGTACC 
TGAGCTCGCTAGCCTCGAGGATATCAAGATCTGGCCTCGGCGGCCAA 
GCTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG (SEQ ID NO:49) 

25 pGL4 related sequences include (SEQ ID Nos.95-97): 

pGL4B-4NN 

GCGGCCGCAAATGCTAAACCACTGCAGTGGTTACCAGTGCTTGATCA 
30 GTGAGGCACCGATCTCAGCGATCTGTCTATTTCGTTCGTCCATAGTGG 
CCTGACTCCCCGTCGTGTAGATTACTACGATTCGTGAGGGCTTACCAT 
CAGGCCCCAGCGCAGCAATGATGCCGCGAGAGCCGCGTTCACCGGCA 
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CCGGATTTGTCAGCAATGAACCAGCCAGCAGGGAGGGCCGAGCGAA 
GAAGTGGTCCTGCTACrTTGTCCGCCTCCATCCAGTCTATGAGCTGCT 
GTCGTGATGCTAGAGTGAGAAGTTCGCCAGTGAGTAGTTTCCGAAGA 
GTTGTGGCCATTGCTACTGGCATCGTGGTATCACGCTCGTCGTTCGGT 
5 ATGGCTTCGTTCAGCTCCGGTTCCCAGCGGTCAAGCCGGGTCACATG 
ATCACCCATGTTGTGCAAAAATGCGGTCAGCTCCTTAGGGCCTCCGA 
TCGTTGTCAGAAGTAAGTTGGCCGCGGTATTATCGCTCATGGTAATGG 
CAGCACTACACAATTCTCTTACCGTCATGCCATCCGTAAGATGCTTTT 
CCGTGACCGGCGAGTACTCAACCAAGTCGTTTTGTGAGTAGTGTATA 

10 CGGCGACCAAGCTGCTCTTGCCCGGCGTCTATACGGGACAACACCGC 
GCCACATAGCAGTACTTTGAAAGTGCTCATCATCGGGAATCGTTCTTC 
GGGGCGGAAAGACTCAAGGATCTTGCCGCTATTGAGATCCAGTTCGA 
TATAGCCCACTCnTGCACCCAGTTGATCTTCAGCATCTITrACTTTCAC 
CAGCGTTTCGGGGTGTGCAAAAACAGGCAAGCAAAATGCCGCAAAG 

1 5 AAGGGAATGAGTGCGACACGAAAATGTTGGATGCTCATACTCTTCCT 
TTTTCAATATGTTTGCAGCATTTGTCAGGGTTACTAGTACGTCTCTCTT 
GAGAGACCGCGATCGCCACCATGTCTAGGTAGGTAGTAAACGAAAG 
GGCTTAAAGGCCTAAGTGGCCCTCGAGTCCAGCCTTGAGTTGGTTGA 
GTCCAAGTCACGTTTGGAGATCTGGTACCTTACGCGTATGAGCTCTAC 

20 GTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGATCTAAGCTTGGCA 
ATCCGGTACTGTTGGTAAAGCCACCATGG 

pGL4B-4NNl 

gcggccgcaaatgctaaaccactgcagtggttaccagtgcttgatcagtgaggcaccgatctcagcgatctgtctatt 
25 tcgttcgtccatagtggcctgactccccgtcgtgtagattactacgattcgtgagggcttaccatcaggccccagcgc 
agcaatgatgccgcgagagccgcgttcaccggcccccgatttgtcagcaatgaaccagccagcagggagggccg 
agcgaagaagtggtcctgctactttgtccgcctccatccagtctatgagctgctgtcgtgatgctagagtaagaagttc 
gccagtgagtagtttccgaagagttgtggccattgctactggcatcgtggtatcacgctcgtcgttcggtatggcttcgt 
tcaactccggttcccagcggtcaagccgggtcacatgatcacccatgttgtgcaaaaatgcggtcagctccttaggg 
30 cctccgatcgttgtcagaagtaagttggccgcggtgttgtcgctcatggtaatggcagcactacacaattctcttaccgt 
catgccatccgtaagatgcttttccgtgaccggcgagtactcaaccaagtcgttttgtgagtagtgtatacggcgacca 
agctgctcttgcccggcgtctatacgggacaacaccgcgccacatagcagtactttgaaagtgctcatcatcgggaa 
tcgttcttcggggcggaaagactcaaggatcttgccgctattgagatccagttcgatatagcccactcttgcacccagt 
tgatcttcagcatcttttactttcaccagcgtttcggggtgtgcaaaaacaggcaagcaaaatgccgcaaagaaggga 
35 atgagtgcgacacgaaaatgttggatgctcatactcttcctttttcaatatgtttgcagcatttgtcagggttactagtacg 
tctctcttgagagaccgcgatcgccaccatgtctaggtaggtagtaaacgaaagggcttaaaggcctaagtggccct 
cgagtccagccttgagttggttgagtccaagtcacgtttggagatctggtaccttacgcgtatgagctctacgtagcta 
gcggcctcggcggccgaattcttgcgttcgaagcttggcaatccggtactgttggtaaagccaccatgg; and 

40 pGL4B-4NN2 

GCGGCCGCAAATGCTAAACCACTGCAGTGGTTACCAGTGCTTGATCA 
GTGAGGCACCGATCTCAGCGATCTGCCTATTTCGTTCGTCCATAGTGG 
CCTGACTCCCCGTCGTGTAGATCACTACGATTCGTGAGGGCTTACCAT 
CAGGCCCCAGCGCAGCAATGATGCCGCGAGAGCCGCGTTCACCGGCC 

45 CCCGATTTGTCAGCAATGAACCAGCCAGCAGGGAGGGCCGAGCGAA 
GAAGTGGTCCTGCTACTTTGTCCGCCTCCATCCAGTCTATGAGCTGCT 
GTCGTGATGCTAGAGTAAGAAGTTCGCCAGTGAGTAGTTTCCGAAGA 
GTTGTGGCCATTGCTACTGGCATCGTGGTATCACGCTCGTCGTTCGGT 
ATGGCTTCGTTCAACTCTGGTTCCCAGCGGTCAAGCCGGGTCACATG 
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ATCACCCATGTTGTGCAAAAATGCGGTCAGCTCCTTAGGGCCTCCGA 
TCGTTGTCAGAAGTAAGTTGGCCGCGGTGTTGTCGCTCATGGTAATGG 
CAGCACTACACAATTCTCTTACCGTCATGCCATCCGTAAGATGCTTTT 
CCGTGACCGGCGAGTACTCAACCAAGTCGTTTTGTGAGTAGTGTATA 
5 CGGCGACCAAGCTGCTCTTGCCCGGCGTCTATACGGGACAACACCGC 
GCCACATAGCAGTACTTTGAAAGTGCTCATCATCGGGAATCGTTCTTC 
GGGGCGGAAAGACTCAAGGATCTTGCCGCTATTGAGATCCAGTTCGA 
TATAGCCCACTCTTGCACCCAGTTGATCTrCAGCATCTTTTACTTTCAC 
CAGCGTTTCGGGGTGTGCAAAAACAGGCAAGCAAAATGCCGCAAAG 

10 AAGGGAATGAGTGCGACACGAAAATGTTGGATGCTCATACTCTTCCT 
TTTTCAATATGTTTGCAGCATTTGTCAGGGTTACTAGTACGTCTCTCTT 
GAGAGACCGCGATCGCCACCATGTCTAGGTAGGTAGTAAACGAAAG 
GGCTTAAAGGCCTAAGTGGCCCTCGAGTCCAGCCTTGAGTTGGTTGA 
GTCCAAGTCACGTTTGGAGATCTGGTACCTTACGCGTATGAGCTCTAC 

1 5 GTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGTTCGAAGCTTGGCA 
ATCCGGTACTGTTGGTAAAGCCACCATGG, 

as well as 

pGL4B-4NN3: 

20 GCGGCCGCAAATGCTAAACCACTGCAGTGGTTACCAGTGCTTGATCA 
GTGAGGCACCGATCTCAGCGATCTGCCTATTTCGTTCGTCCATAGTGG 
CCTGACTCCCCGTCGTGTAGATCACTACGATTCGTGAGGGCTTACCAT 
CAGGCCCCAGCGCAGCAATGATGCCGCGAGAGCCGCGTTCACCGGCC 
CCCGATTTGTCAGCAATGAACCAGCCAGCAGGGAGGGCCGAGCGAA 

25 GAAGTGGTCCTGCTACTTTGTCCGCCTCCATCCAGTCTATGAGCTGCT 
GTCGTGATGCTAGAGTAAGAAGTTCGCCAGTGAGTAGTTTCCGAAGA 
GTTGTGGCCATTGCTACTGGCATCGTGGTATCACGCTCGTCGTTCGGT 
ATGGCTTCGTTCAACTCTGGTTCCCAGCGGTCAAGCCGGGTCACATG 
ATCACCCATATTATGAAGAAATGCAGTCAGCTCCTTAGGGCCTCCGA 

30 TCGTTGTCAGAAGTAAGTTGGCCGCGGTGTTGTCGCTCATGGTAATGG 
CAGCACTACACAATTCTCTTACCGTCATGCCATCCGTAAGATGCTTTT 
CCGTGACCGGCGAGTACTCAACCAAGTCGTTTTGTGAGTAGTGTATA 
CGGCGACCAAGCTGCTCTTGCCCGGCGTCTATACGGGACAACACCGC 
GCCACATAGCAGTACTTTGAAAGTGCTCATCATCGGGAATCGTTCTTC 

35 GGGGCGGAAAGACTCAAGGATCTTGCCGCTATTGAGATCCAGTTCGA 
TATAGCCCACTCTTGCACCCAGTTGATCTTCAGCATCTTTTACTTTCAC 
CAGCGTTTCGGGGTGTGCAAAAACAGGCAAGCAAAATGCCGCAAAG 
AAGGGAATGAGTGCGACACGAAAATGTTGGATGCTCATACTCTTCCT 
TTTTCAATATGTTTGCAGCATTTGTCAGGGTTACTAGTACGTCTCTCTT 

40 GAGAGACCGCGATCGCCACCATGTCTAGGTAGGTAGTAAACGAAAG 
GGCTTAAAGGCCTAAGTGGCCCTCGAGTCCAGCCTTGAGTTGGTTGA 
GTCCAAGTCACGTTTGGAGATCTGGTACCTTACGCGTATGAGGGTTG 
AGTCCAAGTCACGTTTGGAGATCTGGTACCTTACGCGTATGAGCTCTA 
CGTAGCTAGCGGCCTCGGCGGCCGAATTCTTGCGTTCGAAGCTTGGC 

45 AATCCGGTACTGTTGGTAAAGCCACCATGG (SEQ ID NO:45) 

pGL4NN from Blue Heron: 

GCGGCCGC AAATGCTAAACC A ^TGCAGTGGTTACCAGTGCTTGATCA 
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GTGAGGCACCGATCTCAGCGATCTGCCTATTTCGTTCGTCCATAGTGG 
CCTGACTCCCCGTCGTGTAGATCACTACGATTCGTGAGGGCTTACCAT 
CAGGCCCCA.GCGCAGCAATGATGCCGCGAGAGCCGCGTTCACCGGCC 
CCCGATTTGTCAGCAATGAACCAGCCAGCAGGGAGGGCCGAGCGAA 
5 GAAGTGGTCCTGCTACTTTGTCCGCCTCCATCCAGTCTATGAGCTGCT 
GTCGTGATGCTAGAGTAAGAAGTTCGCCAGTGAGTAGTTTCCGAAGA 
GTTGTGGCCATTGCTACTGGCATCGTGGTATCACGCTCGTCGTTCGGT 
ATGGCTTCGTTCAACTCTGGTTCCCAGCGGTCAAGCCGGGTCACATG 
ATCACCCATATTATGAAGAAATGCAGTCAGCTCCTTAGGGCCTCCGA 

10 TCGTTGTCAGAAGTAAGTTGGCCGCGGTGTTGTCGCTCATGGTAATGG 
CAGCACTACACAATTCTCTTACCGTCATGCCATCCGTAAGATGCTTTT 
CCGTGACCGGCGAGTACTCAACCAAGTCGTTTTGTGAGTAGTGTATA 
CGGCGACCAAGCTGCTCTTGCCCGGCGTCTATACGGGACAACACCGC 
GCCACATAGCAGTACTTTGAAAGTGCTCATCATCGGGAATCGTTCTTC 

1 5 GGGGCGGAAAGACTCAAGGATCTTGCCGCTATTGAGATCCAGTTCGA 
TATAGCCCACTCTTGCACCCAGTTGATCTTCAGCATCTTTTACTTTCAC 
CAGCGTTTCGGGGTGTGCAAAAACAGGCAAGCAAAATGCCGCAAAG 
AAGGGAATGrAGTGCGACACGAAAATGTTGGATGCTCATACTCTTCCT 
TTTTCAATATGTTTGCAGCATTTGTCAGGGTTACTAGTACGTCTCTCA 

20 AGAGATTTGTGCATACACAGTGACTCATACTTTCACCAATACTTTGCA 
TTTTGGATAAATACTAGACAACTTTAGAAGTGAATTATTTATGAGGTT 
GTCTTAAAATTAAAAATTACAAAGTAATAAATCACATTGTAATGTATT 
TTGTGTGATACCCAGAGGTTTAAGGCAACCTATTACTCTTATGCTCCT 
GAAGTCCACAATTCACAGTCCTGAACTATAATCTTATCTTTGTGATTG 

25 CTGAGCAAATTTGCAGTATAATTTCAGTGCTTTTAAATTTTGTCCTGC 
TTACTATTTTCCTTTTTTATTTGGGTTTGATATGCGTGCACAGAATGGG 
GCTTCTATTAvAAATATTCTTGAGAGACCGCGATCGCCACCATGTCTAG 
GTAGGTAGTAAACGAAAGGGCTTAAAGGCCTAAGTGGCCCTCGAGTC 
CAGCCTTGAGTTGGTTGAGTCCAAGTCACGTTTGGAGATCTGGTACCT 

30 TACGCGTATGAGCTCTACGTAGCTAGCGGCCTCGGCGGCCGAATTCT 
TGCGTTCGAAGCTTGGCAATCCGGTACTGTTGGTAAAGCCACCATGG 
(SEQIDN0:46), 

pGL4 with promoter changes: 

35 

GCGGCCGCAAATGCTAAACCACTGCAGTGGTTACCAGTGCTTGATCA 

GTGAGGCACCGATCTCAGCGATCTGCCTATTTCGTTCGTCCATAGTGG 

CCTGACTCCCCGTCGTGTAGATCACTACGATTCGTGAGGGCTTACCAT 

CAGGCCCCAGCGCAGCAATGATGCCGCGAGAGCCGCGTTCACCGGCC 

40 CCCGATTTGTCAGCAATGAACCAGCCAGCAGGGAGGGCCGAGCGAA 

GAAGTGGTCCTGCTACTTTGTCCGCCTCCATCCAGTCTATGAGCTGCT 

GTCGTGATGCTAGAGTAAGAAGTTCGCCAGTGAGTAGTTTCCGAAGA 

GTTGTGGCCATTGCTACTGGCATCGTGGTATCACGCTCGTCGTTCGGT 

ATGGCTTCGTTCAACTCTGGTTCCCAGCGGTCAAGCCGGGTCACATG 

45 ATCACCCATATTATGAAGAAATGCAGTCAGCTCCTTAGGGCCTCCGA 
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TCGTTGTCAGAAGTAAGTTGGCCGCGGTGTTGTCGCTCATGGTAATGG 
CAGCACTACACAATTCTCTTACCGTCATGCCATCCGTAAGATGCTTTT 
CCGTGACCGGCGAGTACTCAACCAAGTCGTTTTGTGAGTAGTGTATA 
CGGCGACCAAGCTGCTCTTGCCCGGCGTCTATACGGGACAACACCGC 
5 GCCACATAGCAGTACTTTGAAAGTGCTCATCATCGGGAATCGTTCTTC 
GGGGCGGAAAGACTCAAGGATCTTGCCGCTATTGAGATCCAGTTCGA 
TATAGCCCACTCTTGCACCCAGTTGATCTTCAGCATCTTTTACTTTCAC 
CAGCGTTTCGGGGTGTGCAAAAACAGGCAAGCAAAATGCCGCAAAG 
AAGGGAATGAGTGCGACA.CGAAAATGTTGGATGCTCATACTCGTCCT 

10 TTTTCAATATTATTGAAGCATTTATCAGGGTTACTAGTACGTCTCTCA 
AGAGATTTGTGCATACACAGTGACTCATACTTTCACCAATACTTTGCA 
TTTTGGATAAATACTAGACAACTTTAGAAGTGAATTATTTATGAGGTT 
GTCTTAAAATTAAAAATTACAAAGTAATAAATCACATTGTAATGTATT 
TTGTGTGATACCCAGAGGTTTAAGGCAACCTATTACTCTTAT (SEQ ID 

15 NO:47), 

A hygromycin gene in a pGL4 vector: 

Atgaagaagcccgaactcaccgctaccagcgttgaaaaatttctcatcgagaagttcgacagtgtgagcgacctgat 
20 gcagttgtcggagggcgaagagagccgagccttcagcttcgatgtcggcggacgcggctatgtactgcgggtgaa 
tagctgcgctgatggcttctacaaagaccgctacgtgtacc^^ 

tgttggacatcggcgagttcagcgagagcctgacatactgcatcagtagacgcgcccaaggcgttactctccaaga 
cctccccgaaacagagctgcctgctgtgttacagcctgtcgccgaagctatggatgctattgccgccgccgacctca 
gtcaaaccagcggcttcggcccattcgggccccaaggcatcggccagtacacaacctggcgggatttcatttgcgc 

25 cattgctgatccccatgtctaccactggcagaccgtgatggacgacaccgtgtccgccagcgtagctcaagccctgg 
acgaactgatgctgtgggccgaagactgtcccgaggtgcgccacctcgtccatgccgacttcggcagcaacaacgt 
cctgaccgacaacggccgcatcaccgccgtaatcgactggtccgaagctatgttcggggacagtcagtacgaggtg 
gccaacatcttcttctggcggccctggctggcttgcatggagcagcagactcgctacttcgagcgccggcatcccga 
gctggccggcagccctcgtctgcgagcctacatgctgcgcatcggcctggatcagctctaccagagcctcgtggac 

30 ggcaacttcgacgatgctgcctgggctcaaggccgctgcgatgccatcgtccgcagcggggccggcaccgtcggt 
cgcacacaaatcgctcgccggagcgcagccgtatggaccgacggctgcgtcgaggtgctggccgacagcggca 
accgccggcccagtacacgaccgcgcgctaaggaggtaggtcgagtttaa (SEQ ED NO:88), 

35 pGLAlO 

ggcctaactggccggtacctgagctcgctagcctcgaggatatcaagatctggcctcggcggccaagcttggcaat 
ccggtactgttggtaaagccaccatggaagatgccaaaaacattaagaagggcccagcgccattctacccactcga 
agacgggaccgccggcgagcagctgcacaaagccatgaagcgctacgccctggtgcccggcaccatcgccttta 
40 ccgacgcacatatcgaggtggacattacctacgccgagtacttcgagatgagcgttcggctggcagaagctatgaa 
gcgctatgggctgaatacaaaccatcggatcgtggtgtgcagcgagaatagcttgcagttcttcatgcccgtgttggg 
tgccctgttcatcggtgtggctgtggccccagctaacgacatctacaacgagcgcgagctgctgaacagcatgggc 
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atcagccagcccaccgtcgtattcgtgagcaagaaagggctgcaaaagatcctcaacgtgcaaaagaagctaccg 

atcatacaaaagatcatcatcatggatagcaagaccgactaccagggcttccaaagcatgtacaccttcgtgacttcc 

camgccacccggcttcaacgagtacgacttcgtgcccgagagcttcgaccgggacaaaaccatcgccctgatcat 

gaacagtagtggcagtaccggattgcccaagggcgtagccctaccgcaccgcaccgcttgtgtccgattcagtcat 

5 gcccgcgaccccatcttcggcaaccagatcatccccgacaccgctatcctcagcgtggtgccatttcaccacggctt 

cggcatgttcaccacgctgggctacttgatctgcggctttcgggtcgtgctcatgtaccgcttcgaggaggagctattc 
ttgcgcagcttgcaagactataagattcaatctgccc^ 

catcgacaagtacgacctaagcaacttgcacgagatcgccagcggcggggcgccgctcagcaaggaggtaggtg 
aggccgtggccaaacgcttccacctaccaggcatccgccagggctacggcctgacagaaacaaccagcgccattc 

10 tgatcacccccgaaggggacgacaagcctggcgcagtaggcaaggtggtgcccttcttcgaggctaaggtggtgg 
acttggacaccggtaagacactgggtgtgaaccagcgcggcgagctgtgcgtccgtggccccatgatcatgagcg 
gctacgttaacaaccccgaggctacaaacgctctcatcgacaaggacggctggctgcacagcggcgacatcgcct 
actgggacgaggacgagcacttcttcatcgtggaccggctgaagagcctgatcaaatacaagggctaccaggtagc 
cccagccgaactggagagcatcctgctgcaacaccccaacatcttcgacgccggggtcgccggcctgcccgacg 

15 acgatgccggcgagctgcccgccgcagtcgtcgtgctggaacacggtaaaaccatgaccgagaaggagatcgtg 
gactatgtggccagccaggttacaaccgccaagaagctgcgcggtggtgttgtgttcgtggacgaggtgcctaaag 
gactgaccggcaagttggacgcccgcaagatccgcgagattctcattaaggccaagaagggcggcaagatcgcc 
gtgtaataattctagagtcggggcggccggccgcttcgagcagacatgataagatacattgatgagtttggacaaac 
cacaactagaatgcagtgaaaaaaatgcmatttgtgaaamgtgatgctattgctttamgtaaccattataa^ 

20 ataaacaagttaacaacaacaattgcattcatmatgm^ 

aaaacctctacaaatgtggtaaaatcgataaggatccgtcgaccgatgcccttgagagccttcaacccagtcagctcc 
ttccggtgggcgcggggcatgactatcgtcgccgcacttatgactgtcttctttatcatgcaactcgtaggac 
cggcagcgctcttccgcttcctcgctcactgactcgctgcgctcggtcgttcggctgcggcgagcggtato 
ctcaaaggcggtaatacggttatccacagaatcaggggataacgcaggaaagaacatgtgagcaaaaggccagca 
25 aaaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccgcccccctgacgagcatcacaaa 
aatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagataccaggcgtttccccctggaagctccc 
tcgtgcgctctcctgttccgaccctgccgcttaccggatacctgtccgcctttctcccttcgggaag 
catagctcacgctgtaggtatctcagttcg^ 

agcccgaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgacttatcgccactggca 
30 gcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacagagttcttgaagtggtggcctaact^ 
cggctacactagaagaacagtatttggtatctgcgctctgctgaagccagttaccttcggaaaaagagttg^ 
tgatccggcaaacaaaccaccgctggtagcggtggttttmgtttgcaagcagcagattacgcgcagaaaaaaagg 
atctcaagaagatcctttgatctmctacggggtctgacgctcagtggaacgaaaactcacgttaagggattttggtca 
tgagattatcaaaaaggatcttcacctagatcctttt^ 

35 aacttggtctgacagcggccgcaaatgctaaaccactgcagtggttaccag^gcttgatcagtgaggcaccgatctc 
agcgatctgcctatttcgttcgtccatagtggcctgactccccgtcgtgtagatcactacgattcgtgag 
caggccccagcgcagcaatgatgccgcgagagccgcgttcaccggcccccgatttgtcagcaatgaaccagcca 
gcagggagggcxgagcgaagaagtggtcctgctacmgto^ 

tagagtaagaagttcgccagtgagtagmccgaagagttgtggccattgctactggcatcgtggtatcacgcto^ 
40 ttcggtatggcttcgttcaactctggttcccagcggtcaagccgggtcacatgatcacccatattatgaagaaatgcag 
tcagctccttagggc^tccgatcgttgtcagaagtaagttggccgcggtgttgtcgctcatg^ 
acaattctcttaccgtcatgccatccgtaagatgctmccgtgaccggcgagtactcaaccaagtcgt^ 
gtatacggcgaccaagctgctcttgcccggcgtctatacgggacaacaccgcgccacatagcagtactttgaaagtg 
ctcatcatcgggaatcgttcttcggggcggaaagactcaaggatcttgccgctattgagatccagttcgatatagccc 
45 actcttgcacccagttgatcttcagcatctttta^ 

cgcaaagaagggaatgagtgcgacacgaaaatgttggatgctcatactcgtccttmcaatatto^ 
agggttactagtacgtctctcaaggataagtaagtaatattaaggtacgggaggtattggacaggccgcaataaaata 
tctttattttcattacatctgtgtgttggttttttgtgtgaatcgatagtactaacatacgcto^ 
caaaacaaactagcaaaataggctgtccccagtgc 
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gggaggtattggacaggccgcaataaaatatc^^ (SEQ ID 

NO:89), and 

pGL4.70 

5 

ggcctaactggccggtacctgagctcgc^ 

ccggtactgttggtaaagccaccatggcttccaaggtgtacgaccccgagcaacgcaaacgcatgatcactgggcc 
tcagtggtgggctcgctgcaagcaaatgaacgtgctggactccttcatcaactactatgattccgagaagcacgccg 
agaacgccgtgattmctgcatggtaacgctgcctccagctacctgtggaggcacgtcgtgcctcacatcgag^ 

10 tggctagatgcatcatccctgatctgatcggaatgggtaagtccggcaagagcgggaatggctcatatcgcctcctg 
gatcactacaagtacctcaccgcttggttcgagctgctgaaccttccaaagaaaatcatctttgtgggccacgactgg 
ggggcttgtctggcctttcactactcctacgagcaccaagacaagatcaaggccatcgtccatgctgagagtgtcgtg 
gacgtgatcgagtcctgggacgagtggcctgacatcgaggaggatetcgccctgatcaagagcgaagagggcga 
gaaaatggtgcttgagaataacttcttcgtcgagaccatgctcccaagcaagatcatgcggaaactggagcctgagg 

15 agttcgctgcctacctggagccattcaaggagaagggcgaggttagacggcctaccctctcctggcctcgcgagat 
ccctctcgttaagggaggcaagcccgacgtcgtccagattgtccgcaactacaacgcctaccttcgggccagcgac 
gatctgcctaagatgttcatcgagtccgaccctgggttcttttccaacgctattgtcgagggagctaagaagttccctaa 
caccgagttcgtgaaggtgaagggcctccacttcagccaggaggacgctccagatgaaatgggtaagtacatcaag 
agcttcgtggagcgcgtgctgaagaacgagcagtaattctagagtcggggcggccggccgcttcgagcagacatg 

20 ataagatacattgatgagtttggacaaaccacaactagaatgcagtgaaaaaaatgctttatttgtgaaatttgtgatgct 
attgcmatttgtaaccattataagctgcaataaacaagttaacaacaacaattgcattcattttatgtttcaggttcaggg 
ggaggtgtgggaggttttttaaagcaagtaaaacctctac^^ 

ccttgagagccttcaacccagtcagctccttccggtgggcgcggggcatgactatcgtcgccgcacttatgactgt^^ 
tctttatcatgcaactcgtaggacaggtgccggcagcgctcttccgcttcctcgctcactgactcgctgcgctcggtcg 

25 ttcggctgcggcgagcggtatcagctcactcaaaggcggtaatacggttatccacagaatcaggggataacgcagg 
aaagaacatgtgagcaaaaggccagcaaaaggccaggaaccgtaaaaaggcc^^ 
gctccgcccccctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaag 
ataccaggcgmccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccggatacctgtc^ 
cmctcccttcgggaagcgtggcgcmctcatagctcacgctgtaggtatctcagttcggtgtaggtcgtt^ 

30 agctgggctgtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaactatcgtctt^ 

ccggtaagacacgacttatcgccactggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggt 
gctacagagttcttgaagtggtggcctaactacggctacactagaagaacagtamggtatctgcgctctgctga 
cagttaccttcggaaaaagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtg glUUUgU lgc 
aagcagcagattacgcgcagaaaaaaaggatctcaagaagatccmgatctmctacggggtctgacgctcagt^^ 

35 aacgaaaactcacgttaagggattttggtcatgagattatcaaaaaggatcttcacctagatccttttaaattaaaaa^ 
agtmaaatcaatctaaagtatatatgagtaaacttggtctgacagcggccgcaaatgctaaaccactgcagtggto 
cagtgcttgatcagtgaggcaccgatctcagcgatctgcctatttcgttcgtccatagtggcctgactcccc^ 
gatcactacgattcgtgagggcttaccatcaggccccagcgcagcaatgatgccgcgagagccgcgttcaccggc 
ccccgatttgtcagcaatgaaccagccagcagggagggccgagcgaagaagtggtcctgctactttgtccgccto 

40 atccagtctatgagctgctgtcgtgatgctagagtaagaagttcgccagtgagtagtttccgaagagtt 

ctactggcatcgtggtatcacgctcgtcgttcggtatggcttcgttcaactctggttcccagcggtcaagcc^^ 
atgatcacccatattatgaagaaatgcagtcagctccttagggcctccgatcgttgtcagaagtaagttggccg^ 
gttgtcgctcatggtaatggcagcactacacaattctcttaccgtcatgccatccgtaagatgctWc^ 
agtactcaaccaagtcgttttgtgagtagtgtatacggcgaccaagctgctcttgcccggcgtctate^ 

45 ccgcgccacatagcagtactttgaaagtgctcatcatcgggaatcgttcttcggggcggaaagactcaaggatcttg^ 
cgctattgagatccagttcgatatagcccactcttg^ 

gtgtgcaaaaacaggcaagcaaaatgccgcaaagaagggaatgagtgcgacacgaaaatgttggatgctcatact 

cgtccttmcaatattattgaagcaWatcagggttactagtacgtctctcaaggataagtaagtaatatt 

gaggtattggacaggccgcaataaaatatcmatmcattacatctgtgtgttggtttmgtgtg^ 
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catacgctctccatcaaaacaaaacgaaacaaaacaaactagcaaaataggctgtccccagtgcaagtgcaggtgc 
cagaacatttctct (SEQ ID NO:90). 



The pGL4 backbone (Notl-NcoT) has the following sequence: 
5 gcggccgcaaatgctaaaccactgcagtggttaccagtgcttgatcagtgaggcaccgatctcagcgatctgcctatt 
tcgttcgtccatagtggcctgactccccgtcgtgtagatcactacgattcgtgagggcttaccatcaggccccagcgc 
agcaatgatgccgcgagagccgcgttcaccggcccccgatttgtcagcaatgaaccagccagcagggagggccg 
agcgaagaagtggtcctgctacmgtccgcctccatcx^ 

gccagtgagtagtttccgaagagttgtggccattgctactggcatcgtggtatcacgctcgtcgttcggtatggcttcgt 
10 tcaactctggttcccagcggtcaagccgggtcacatgatcacccatattatgaagaaatgcagtcagctccttagggc 
ctccgatcgttgtcagaagtaagttggccgcggtgttgtcgctcatggtaatggcagcactacacaattctcttaccgtc 
atgccatccgtaagatgcttttccgtgaccggcgagtactcaaccaagtcgttttgtgagtagtgtatacggcgaccaa 
gctgctcttgcccggcgtctatacgggacaacaccgcgccacatagcagtactttgaaagtgctcatcatcgggaat 
cgttcttcggggcggaaagactcaaggatcttgccgctattgagatccagttcgatatagcccactcttgcacccagtt 
15 gatcttcagcatcttttactttcaccagcgtttcggggtgtgcaaaaacaggcaagcaaaatgccgcaaagaaggga 
atgagtgcgacacgaaaatgttggatgctcatactcgtcctttttcaatattattgaagcatttatcagggttactagtacg 
tctctcaaggataagtaagtaatattaaggtacgggaggtattggacaggccgcaataaaatatctttattttcattacat 
ctgtgtgttggttttttgtgtgaatcgatagtactaacatacgctctccatcaaaacaaaacgaaacaaaacaaacte^ 
aaaataggctgtccccagtgcaagtgcaggtgccagaacatttctctggcctaactggccggtacctgagctcgcta 
20 gcctcgaggatatcaagatctggcctcggcggccaagcttggcaatccggtactgttggtaaagccaccatgg 

(SEQIDNO:74). 

Example 10 

Summary of Sequences Removed in Synthetic Genes 
25 Search parameters: 

TFBS searches were limited to vertebrate TF binding sites. Searches 
were performed by matrix family, i.e., the results show only the best match from 
a family for each site. Matlnspector default parameters were used for the core 
and matrix similarity values (core similarity = 0.75, matrix similarity = 
30 optimized), except for sequence MCS-1 (core similarity = 1.00, matrix similarity 
== optimized). 

Promoter module searches included all available promoter modules 
(vertebrate and others) and were performed using default parameters (optimized 
threshold or 80% of maximum score). 
35 Splice site searches were performed for splice acceptor or donor 

consensus sequences. 



Table 31 
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i.£matcnes)iHfc 


aPrbm'otfet: 
s7modules<'< 


i; SiiUce.?sites).- 


puro 


(not 

applicable) 


62 


5 


0 


hpuro 


(not 

applicable) 


68 


4 


1 


hpurol 


Ver4.1Feb 
2004 


4 


2 


1 


hpuro2 


Ver 4.1 Feb 
2004 


2 


0 


1 


















Neo 


(not 

applicable) 


53 


0 


No data 


hneo 


(not 

applicable) 


61 


2 


3 


hneo-1 


Ver 3.1.2 Jun 
2003 


No data 


No data 


No data 


hneo-2 


Ver3.L2 Jun 
2003 


No data 


No data 


No data 


hneo-3 


Ver 3.1.2 Jun 
2003 


0 


0 


0 


hneo-4 


Ver 4.1 Feb 
2004 


7 


1 


0 


hneo-5 


Ver 4.1 Feb 
2004 


0 


0 


0 















Hyg 


(not 

applicable) 


74 


3 


No data 


hhyg 


(not 

applicable) 


94 


4 


6 


hhyg-1 


Ver 3.1.2 Jun 
2003 


No data 


No data 


No data 


hhyg-2 


Ver 3.1.2 Jun 
2003 


No data 


No data 


No data 


hhyg-3 


Ver 3.1.2 Jun 
2003 


3 


0 


0 


hHygro 


Ver 3.3 Aug 
2003 


5 


0 


0 


hhyg-4 


Ver 3.3 Aug 
2003 


4 


0 


0 


















Luc 


(not 

applicable) 


213 


11 


No data 


Luc+ 


(not 


189 


7 


No data 
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^ (farSif' 


r ErombteB 


'iSplibV'Sitesb 
..:(& strand) <>: 

r ■'• '■ 




applicable) 








hluc+ver2Al 


Ver3.0 Nov 
2002 


110 


7 


6 


hluc+ver2A2 


Ver3.0Nov 
2002 


No data 


No data 


No data 


hluc+ver2A3 


Ver3.0Nov 
2002 


8 


No data 


0 


hluc+ver2A4 


Ver3.0Nov 
2002 


No data 


No data 


No data 


hluc+ver2A5 


Ver 3.0 Nov 
2002 


No data 


No data 


No data 


hluc+ver2A6 


Ver 3.0 Nov 
2002 


2 


0 


0 


hluc+ver2A6 


Ver 3.1.1 Apr 
2003 


4 


0 


0 


hluc+ver2A7 


Ver 3.1.1 Apr 
2003 


1 


0 


0 


hluc+ver2A8 


Ver 3.1.1 Apr 
2003 


1 


0 


0 


hluc+ver2Bl 


Ver 3.0 Nov 
2002 


187 


2 


8 


hluc+ver2B2 


Ver 3.0 Nov 
2002 


No data 


No data 


No data 


hluc+ver2B3 


Ver 3.0 Nov 
2002 


35 


No data 


0 


hluc+ver2B4 


Ver 3.0 Nov 
2002 


No data 


No data 


No data 


hluc+ver2B5 


Ver 3.0 Nov 
2002 


No data 


No data 


No data 


hluc+ver2B6 


Ver 3.0 Nov 
2002 


2 


0 


0 


hluc+ver2B6 


Ver 3.1.1 Apr 
2003 


6 


0 


0 


hluc+ver2B7 


Ver 3.1.1 Apr 
2003 


2 


0 


0 


hluc+ver2B8 


Ver 3.1.1 Apr 
2003 


1 


0 


0 


hluc+ver2B9 


Ver 3.1.1 Apr 


1 


0 


0 


hluc+ver2B10 


Ver 3.1.1 Apr 
2003 


1 


0 


0 












MCS-1 


Ver 2.2 Sep 
2001 


14 


No data 


(not 

applicable) 
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-Secjuence > '<>s 

SifWfl 






^roinotels 


iiSphcesifesg 


MCS-2 


Ver2.2 Sep 
2001 


12 


No data 


(not 

applicable) 


MCS-3 


Ver2.2 Sep 
2001 


0 


No data 


(not 

applicable) 


MCS-4 


Ver2.3Feb 
2001 


0 


0 


(not 

applicable) 



















Bla 


(not 

applicable) 


No data 


No data 


(not 

applicable) 


bla-1 


Ver2.2 Sep 
2001 


94 


1 


(not 

applicable) 


bla-2 


Ver2.3Feb 
2001 


51 


No data 


(not 

applicable) 


bla-3 


Ver2.3Feb 
2001 


16 


No data 


(not 

applicable) 


bla-4 


Ver2.3Feb 
2001 


14 


No data 


(not 

applicable) 


bla-5 


Ver2.3Feb 
2001 


5 


0 


(not 

applicable) 





























pGL4B-4NN 


Ver 2.4 May 
2002 


11 


0 


(not 
applicable) 


pGL4B-4NNl 


Ver 2.4 May 

2002 


7 


No data 


(not 

applicable) 


pGL4B-4NN2 


Ver 2.4 May 
2002 


4 


0 


(not 

applicable) 


pGL4B-4NN3 


Ver 2.4 May 
2002 


3 


0 


(not 

applicable) 


















Spel-Ncol- 
Ver2-Start 


Ver 4.0 Nov 
2003 


34 


1 


(not 

applicable) 


Spel-Ncol-V&2 


Ver 4.0 Nov 
2003 


28 


1 


(not 

applicable) 



Using the 5 sequences, i.e., hluc+ver2Al, bla-1, hneo-1, hpuro-1, hhyg~l 
(humanized codon usage) for analysis, TFBS from the following families were 
found in 3 out 5 sequences: 
5 VSAHRR (AHR-arnt heterodimers and AHR-related factors) 

VSETSF (Human and murine ETS1 factors) 

V&NFKB (Nuclear Factor Kanpa B/c-rel) 
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VSVMYB (AMV-viral myb oncogene ) 
VSCDEF (Cell cycle regulators: Cell cycle dependent element) 
VSHAND (bHLH transcription factor dimer of HAND2 and El 2) 
V$NRSF (Neuron-Restrictive Silencer Factor) 
5 VSWHZF (Winged Helix and ZF5 binding sites) 

V$CMYB (C-myb, cellular transcriptional activator) 
VSMINI (Muscle INItiator) 

V$P53F (p53 tumor suppr.-neg. regulat. of the tumor suppr. Rb) 
V$ZF5F (ZF5 POZ domain zinc finger) 

10 V$DEAF (Homolog to deformed epidermal autoregulatory factor- 1 

from D. melanogaster) 

VSMYOD (MYOblast Determining factor) 

V$PAX5 (PAX-5/PAX-9 B-cell-specific activating protein) 

VSEGRF (EGR/nerve growth Factor Induced protein C & rel. fact.) 

1 5 V$NEUR (NeuroD, Beta2, HLH domain) 

VSREBV (Epstein-Barr virus transcription factor R); 

TFBS from the following families were found in 4 out of 5 sequences: 
VSETSF (Human and murine ETS1 factors) 
20 VSCDEF (Cell cycle regulators: Cell cycle dependent element) 

VSHAND (bHLH transcription factor dimer of HAND2 and El 2) 
VSNRSF (Neuron-Restrictive Silencer Factor) 
VSPAX5 (PAX-5/PAX-9 B-cell-specific activating protein) 
VSNEUR (NeuroD, Beta2, HLH domain); and 

25 

TFBS from the following families were found in 5 out of 5 sequences: 
VSPAX5 (PAX-5/PAX-9 B-cell-specific activating protein). 
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All publications, patents and patent applications are incorporated herein 
by reference. "While in the foregoing specification, this invention has been 
described in relation to certain preferred embodiments thereof, and many details 
have been set forth for purposes of illustration, it will be apparent to those skilled 
10 in the art that the invention is susceptible to additional embodiments and that 
certain of the details herein may be varied considerably without departing from 
the basic principles of the invention. 
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WHAT IS CLAIMED IS: 

1 . An isolated nucleic acid molecule comprising a synthetic nucleotide 

sequence having a coding region for a selectable polypeptide, wherein the 

5 synthetic nucleotide sequence has 90% or less nucleic acid sequence 

identity to a parent nucleic acid sequence encoding a corresponding 
selectable polypeptide, wherein the decreased sequence identity is a result 
of different codons in the synthetic nucleotide sequence relative to the 
codons in the parent nucleic acid sequence, wherein the nucleotide 

10 sequence encodes a selectable polypeptide with at least 85% amino acid 

sequence identity to the corresponding selectable polypeptide encoded by 
the parent nucleic acid sequence, wherein the synthetic nucleotide 
sequence has a reduced number of regulatory sequences relative to the 
average number of regulatory sequences resulting from random selections 

15 of codons at the sequences which differ between the synthetic nucleotide 

sequence and the parent nucleic acid sequence, and wherein the synthetic 
nucleotide sequence, when expressed in a cell, confers resistance to 
ampicillin, puromycin, hygromycin or neomycin. 

20 2. The isolated nucleic acid molecule of claim 1 wherein the regulatory 
sequences include transcription factor binding sequences, intron splice 
sites, poly(A) sites, promoter modules, and/or promoter sequences. 

3. The isolated nucleic acid molecule of claim 1 wherein a majority of the 
25 codons which differ are ones that are preferred codons of a desired host 

cell and/or are not low-usage codons in that host cell. 

4. The isolated nucleic acid molecule of claim 3 wherein the majority of the 
codons which differ in the synthetic nucleic acid sequence are those 

30 which are employed more frequently in mammals. 
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5. The isolated nucleic acid molecule of claim 3 wherein the majority of the 
codons which differ in the synthetic nucleic acid sequence are those 
which are preferred codons in humans. 

5 6. The isolated nucleic acid molecule of claim 3 wherein the majority of 

codons which differ are the codons CGC, CTG, AGC, ACC, CCC, GCC, 
GGC, GTG, ATC, AAG, AAC, CAG, CAC, GAG, GAC, TAC, TGC 
and TTC. 



10 7. * The isolated nucleic acid molecule of claim 1 wherein the nucleic acid 

molecule encodes a fusion of the selectable polypeptide with a luciferase. 

8. The isolated nucleic acid molecule of claim 7 wherein the luciferase is a 
Renilla luciferase, a firefly luciferase or a click beetle luciferase. 

15 

9. The isolated nucleic acid molecule of claim 1 wherein the parent nucleic 
acid sequence is a wild-type neo, hyg, bla or puro sequence. 

10. The isolated nucleic acid molecule of claim 1 wherein the parent nucleic 
20 acid sequence is SEQ ID NO: 1 , SEQ ID NO: 6, SEQ ID NO: 1 5 or SEQ 

H)NO:41. 



1 1 . The isolated nucleic acid molecule of claim 1 wherein the synthetic 
nucleotide sequence comprises an open reading frame in SEQ ID NO:4 s 

25 SEQ ID NO:5, SEQ ID NO:9, SEQ ID !SfO:10, SEQ ID NO:ll, SEQ ID 

NO:30, SEQ ID NO:38, SEQ ID N0:39, SEQ ID NO:42, SEQ ID 
NO:44; SEQ ID NO:70, SEQ ID NO:71 , SEQ ID NO:72, SEQ ID 
NO:73, SEQ ID NO:74, SEQ ID N0:8O, SEQ ID NO:81, SEQ ID 
NO:82, SEQ ID NO:83, or SEQ ID NO :84. 

30 

12. The isolated nucleic acid molecule of claim 1 wherein the synthetic 
nucleotide sequence has at least 10% fe^ver regulatory sequences. 
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13. The isolated nucleic acid molecule of claim 1 wherein the synthetic 
nucleotide sequence has an increased number of AGC serine-encoding 
codons, an increased number of ATC isoleucine-encoding codons, an 
increased number of CCC proline-encoding codons, and/or an increased 

5 number of ACC threonine-encoding codons . 

14. The isolated nucleic acid molecule of claim 1 wherein the codons in the 
synthetic nucleotide sequence which differ encode the same amino acids 
as the corresponding codons in the parent nucleic acid sequence. 

10 

15. The isolated nucleic acid molecule of claim 1 which has at least 90% 
nucleotide sequence identity to an open reading frame in aay one of $EQ 
ID NO:4, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 1 1, 
SEQ ID NO:30, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NTO:42, SEQ 

15 ID NO:44, SEQ ID NO:70, SEQ ID NO:71, SEQ ID N0:72, SEQ ID 

NO:73, SEQ ID NO:74, SEQ ID NO:80, SEQ ID NO:81, SEQ ID 
NO:82, SEQ ID NO:83, or SEQ ID NO:84, or the complement thereof. 

16. The isolated nucleic acid molecule of claim 1 wherein the nucleic acid 
20 molecule encodes a fusion of the selectable polypeptide with one or more 

other peptides or polypeptides, wherein at least the selectable polypeptide 
is encoded by the synthetic nucleic acid sequence. 

17. The isolated nucleic acid molecule of claim 16 wherein one or more other 
25 peptides are peptides having protein destabilization sequences. 

18. A plasmid comprising the nucleic acid molecule of claim 1. 

19. The plasmid of claim 18 which further comprises a multiple cloning 
30 region. 



20. 



The plasmid of claim 18 which further comprises an open reading frame 
of interest. 
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The plasmid of claim 18 which further comprises a promoter functional 
in a particular host cell operably linked to the synthetic nucleotide 
sequence. 

The plasmid of claim 21 wherein the promoter is functional in a 
prokaryotic cell. 

The plasmid of claim 21 wherein the promoter is functional in a 
eukaryotic cell. 

The plasmid of claim 20 further comprising a promoter operably linked 
to the open reading frame of interest. 

An isolated nucleic acid molecule comprising a synthetic nucleotide 
sequence encoding a firefly luciferase, wherein the synthetic nucleotide 
sequence has 80% or less nucleic acid sequence identity to a parent 
nucleic acid sequence having SEQ ID NO:43 or 85% or less nucleic acid 
sequence identity to a parent nucleic acid sequence having SEQ ID 
NO: 14 which encodes a firefly luciferase, wherein the decreased 
sequence identity is a result of different codons in the synthetic 
nucleotide sequence relative to the codons in the parent nucleic acid 
sequence, wherein the synthetic nucleotide sequence encodes a firefly 
luciferase which has at least 85% amino acid sequence identity to the 
corresponding luciferase encoded by the parent nucleic acid sequence, 
and wherein the synthetic nucleotide sequence has a reduced number of 
regulatory sequences relative to the average number of regulatory 
sequences resulting from random selections of codons at the sequences 
which differ between the synthetic nucleotide sequence and the parent 
nucleic acid sequence. 

The isolated nucleic acid molecule of claim 25 wherein the regulatory 
sequences include transcription factor binding sequences, intron splice 
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sites, poly(A) sites, promoter modules, and/or promoter sequences. 

The isolated nucleic acid molecule of claim 25 wherein a majority of the 
codons which differ are ones that are preferred codons of a desired host 
cell and/or are not low-usage codons in that host cell. 

The isolated nucleic acid molecule of claim 27 wherein the majority of 
the codons which differ in the synthetic nucleic acid molecule are those 
which are employed more frequently in mammals. 

The isolated nucleic acid molecule of claim 27 wherein the majority of 
the codons which differ in the synthetic nucleic acid molecule are those 
which are preferred codons in humans. 

The isolated nucleic acid molecule of claim 27 wherein the majority of 
codons which differ are the codons CGC, CTG, AGC, ACC, CCC, GCC, 
GGC, GTG, ATC, AAG, AAC, CAG, CAC, GAG, GAC, TAC, TGC 
and TTC. 

The isolated nucleic acid molecule of claim 25 wherein the synthetic 
nucleotide sequence comprises a sequence in an open reading frame in 
SEQ ID NO:21, SEQ ID NO:22, or SEQ ID NO:23 or has at least 90% 
nucleotide sequence identity thereto. 

The isolated nucleic acid molecule of claim 25 wherein the synthetic 
nucleic acid molecule is expressed in a mammalian host cell at a level 
which is greater than that of the parent nucleic acid sequence. 

The isolated nucleic acid molecule of claim 25 wherein the synthetic 
nucleic acid molecule has an increased number of AGC serine-encoding 
codons, an increased number of CCC proline-encoding codons, an 
increased number of ATC isoleucine-encoding codons and/or an 
increased number of ACC threonine-encoding codons. 
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34. The isolated acid molecule of claim 25 wherein the synthetic nucleotide 
sequence has at least 10% fewer transcription regulatory sequences . 

5 35. The isolated nucleic acid molecule of claim 25 wherein the codons in the 
synthetic nucleotide sequence which differ encode the same amino acids 
as the corresponding codons in the parent nucleic acid sequence. 

36. The isolated nucleic acid molecule of claim 25 wherein the nucleic acid 
10 molecule encodes a fusion of the luciferase with one or more other 

peptides or polypeptides, wherein at least the luciferase is encoded by the 
synthetic nucleic acid sequence. 

37. The isolated nucleic acid molecule of claim 36 wherein one or more other 
1 5 peptides are peptides having protein destabilization sequences, 

38. A plasmid comprising the nucleic acid molecule of claim 25. 

39. The plasmid of claim 38 which further comprises a multiple cloning 
20 region. 

40. The plasmid of claim 38 which further comprises a promoter operatively 
linked to the synthetic nucleotide sequence. 

25 41 . The plasmid of claim 38 which further comprises the synthetic nucleotide 
sequence of the nucleic acid molecule of claim 1. 

42. An expression vector comprising the nucleic acid molecule of claim 25 
linked to a promoter functional in a cell. 

30 

43. The expression vector of claim 42 wherein the promoter is functional in a 
eukaryotic cell. 
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44. The expression vector of claim 42 wherein the expression vector further 
comprises a multiple cloning site. 

45. The expression vector of claim 42 wherein the promoter is functional in a 
5 mammalian cell. 

46. The expression vector of claim 42 wherein the synthetic nucleotide 
sequence is operatively linked to a Kozak consensus sequence. 

10 47. A plasmid comprising a nucleotide sequence comprising SEQ ID NO:74 
or a nucleotide sequence comprising at least 80% nucleic acid sequence 
identity to SEQ ID NO:74, which nucleotide sequence comprises an open 
reading frame with less than 90% nucleic acid sequence identity to SEQ 
ID NO:41, and the expression of which open reading frame in a host cell 

1 5 confers resistance to ampicillin. 

48. A host cell comprising the expression cassette of claim 42. 

49. A host cell comprising the plasmid of claim 1 7, 38 or 47. 

20 

50. A kit comprising, in suitable container means, the plasmid of claim 17, 
38 or 47. 

51. A polynucleotide which hybridizes under stringent hybridization 
25 conditions to SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:9, SEQ ID 

NO:10, SEQ ID NO:l 1, SEQ ID NO:30, SEQ ID NO:38, SEQ ID 
NO:39, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:70, SEQ ID 
NO:71 , SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:74, SEQ ID 
NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID 
30 NO:84, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, or the 

complement of the polynucleotide, wherein the polynucleotide or the 
complement thereof encodes a selectable polypeptide or a firefly 
luciferase. 

224 



WO 2006/034061 



PCTAUS2005/033218 



52. The polynucleotide of claim 51 which does not have SEQ ID NO:l, SEQ 
ID NO:6, SEQ ID NO: 15, SEQ ID NO:41, SEQ ID NO: 14, or SEQ ID 
NO:43. 

5 

53. An isolated nucleic acid molecule comprising a synthetic nucleotide 
sequence which does not code for a desirable peptide or polypeptide but 
includes sequences which inhibit transcription and/or translation, wherein 
the synthetic nucleotide sequence has at least 20 nucleotides which have 

10 a different sequence relative to a corresponding parent nucleic acid 

sequence which does not code for the desirable peptide or polypeptide, 
wherein the synthetic nucleotide sequences has 90% or less nucleic acid 
sequence identity to the parent nucleic acid sequences, and wherein the 
sequence difference is a result of a reduced number of one or more 

1 5 regulatory sequences in the synthetic nucleotide sequence relative to the 

parent nucleic acid sequence. 

54. The isolated nucleic acid molecule of claim 53 wherein the synthetic 
nucleotide sequence has SEQ ID NO:49. 

20 

55. The isolated nucleic acid molecule of claim 53 further comprising a 
multiple cloning region and/or a poly(A) site. 

56. The isolated nucleic acid molecule of claim 53 wherein the sequences 
25 which inhibit transcription include one or more poly(A) sites. 

57. The isolated nucleic acid molecule of claim 53 wherein the sequences 
which inhibit translation include one or more stop codons in one or more 
reading frames. 



30 



58. The isolated nucleic acid molecule of claim 53 wherein the parent nucleic 
acid sequence includes a multiple cloning region. 
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59. The isolated nucleic acid molecule of claim 53 wherein the parent nucleic 
acid sequence includes sequences which inhibit transcription and/or 
translation. 

5 60. The isolated nucleic acid molecule of claim 53 wherein the parent nucleic 
acid sequence has SEQ ID NO: 76. 

61 . The isolated nucleic acid molecule of claim 53 wherein the synthetic 
nucleotide sequence has a reduced number of one or more restriction 

10 endonuclease recognition sites relative to the parent nucleic acid 

sequence. 

62. A plasmid comprising the nucleic acid molecule of claim 53. 

15 63. A plasmid which includes a sequence including SEQ ID NO:89, SEQ ID 
NO: 90, or a sequence having at least 90% nucleic acid sequence identity 
thereto, or the complement thereof, which sequence encodes at least one 
selectable and/or screenable polypeptide. 

20 64. The plasmid of claim 63 further comprising a multiple cloning region. 

65. The plasmid of claim 63 further comprising another selectable or 
screenable polypeptide. 

25 66. The plasmid of claim 63 or 65 wherein at least one selectable or 

screenable polypeptide comprises one or more protein destabilization 
sequences. 

67. The plasmid of claim 63 wherein the sequence for the at least one 
30 selectable and/or screenable polypeptide is not SEQ ID NO:41 . 

68. A synthetic nucleotide sequence of at least 100 nucleotides having a 
coding region for a selectable polypeptide which confers resistance to 
ampicillin, puromycin, hygromycin or neomycin, wherein the synthetic 
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nucleotide sequence has 90% or less nucleic acid sequence identity to a 
corresponding region of a parent nucleic acid sequence for the selectable 
polypeptide, wherein the decreased sequence identity is a result of 
different codons in the synthetic nucleotide sequence relative to the 
5 codons in the corresponding region in the parent nucleic acid sequence, 

wlterein the synthetic nucleotide sequence has a reduced number of 
regulatory sequences relative to the average number of regulatory 
sequences resulting from random selections of codons at the sequences 
which differ between the synthetic nucleotide sequence and the parent 
10 nucleic acid sequence. 

69. An isolated nucleic acid molecule encoding a selectable polypeptide and 
comprising a synthetic nucleotide sequence of at least 100 nucleotides 
having a coding region for the selectable polypeptide, wherein the 

15 synthetic nucleotide sequence has 90% or less nucleic acid sequence 

identity to a corresponding region in a parent nucleic acid sequence for 
the selectable polypeptide, wherein the decreased sequence identity is a 
result of different codons in the synthetic nucleotide sequence relative to 
the codons in the parent nucleic acid sequence, wherein the synthetic 

20 nucleotide sequence encodes a region of the selectable polypeptide with 

at least 85% amino acid sequence identity to the corresponding region of 
the selectable polypeptide encoded by the parent nucleic acid sequence, 
wherein the synthetic nucleotide sequence has a reduced number of 
regulatory sequences relative to the average number of regulatory 

25 sequences resulting from random selections of codons at the sequences 

which differ between the synthetic nucleotide sequence and the parent 
nucleic acid sequence, and wherein the isolated nucleic acid molecule, 
when expressed in a cell, confers resistance to ampicillin, puromycin, 
hygromycin or neomycin. 
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Figure 1 



Amino Acid 


Loaon 


Phe 


TTfTTT TTT T/*1 

UUU, UUL 


Ser 


UCU, UCC, UCA, UIaj, AUU, ALtC 


Tyr 


TTATT TTAP 

UAU, UAC 


Cys 


UGU, UGC 


Leu 


r TT T A T TT T/"» TT T C*\ TO /T T A /T Tfl 

UUA, UUCj, CUU, CUU, OUA, i^ULr 


Tip 


UGG 


Pro 


CCU, CCC, CCA, CCG 


His 


C AU, CAC 


Arg 


CGU, CGC, CGA, CGG, AGA, AUG 


Gin 


CAA, CAG 


He 


AUU, AUC, AUA 


Thr 


ACU, ACC, ACA, ACG 


Asn 


AAU.AAC 


Lys 


AAA, AAG 


Met 


AUG 


Val 


GUU, GUC, GUA, GUG 


Ala 


GCU, GCC, GCA, GCG 


Asp 


GAU, GAC 


Gly 


GGU, GGC, GGA, GGG 


Glu 


GAA ) GAG 
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SEQUENCE LISTING 



<110> Promega Corporation 
5 Wood, Keith 

Wood, Monika 
Almond, Brian 
Paguio, Aileen 
Fan, Frank 

10 

<120> Synthetic nucleic acid molecule and methods of preparation 
<130> 341.034WO1 

15 

<160> 97 



<170> FastSEQ for Windows Version 4.0 

20<210> 1 
<211> 795 
<212> DNA 
<213> Unknown 



25<220> 

<223> Neo from neomycin gene from Promega' s pCI-neo. 



<400> 1 

atgattgaac aagatggatt gcacgcaggt 
3 0ggctatgact gggcacaaca gacaatcggc 
gcgcaggggc gcccggttct ttttgtcaag 
caggacgagg cagcgcggct atcgtggctg 
ctcgacgttg tcactgaagc gggaagggac 
gatctcctgt catctcacct tgctcctgcc 
35cggcggctgc atacgcttga tccggctacc 
atcgagcgag cacgtactcg gatggaagcc 
gagcatcagg ggctcgcgcc agccgaactg 
ggcgaggatc tcgtcgtgac ccatggcgat 
ggccgctttt ctggattcat cgactgtggc 
40atagcgttgg ctacccgtga tattgctgaa 
ctcgtgcttt acggtatcgc cgctcccgat 
gacgagttct tctga 



tctccggccg 


cttgggtgga 


gaggctattc 


60 


tgctctgatg 


ccgccgtgtt 


ccggctgtca 


120 


accgacctgt 


ccggtgccct 


gaatgaactg 


180 


gccacgacgg 


gcgttccttg 


cgcagctgtg 


240 


tggctgctat 


tgggcgaagt 


gccggggcag 


300 


gagaaagtat 


ccatcatggc 


tgatgcaatg 


360 


tgcccattcg 


accaccaagc 


gaaacatcgc 


420 


ggtcttgtcg 


atcaggatga 


tctggacgaa 


480 


ttcgccaggc 


tcaaggcgcg 


catgcccgac 


540 


gcctgcttgc 


cgaatatcat 


ggtggaaaat 


600 


cggctgggtg 


tggcggaccg 


ctatcaggac 


660 


gagcttggcg 


gcgaat^ggc 


tgaccgcttc 


720 


tcgcagcgca 


tcgccttcta 


tcgccttctt 


780 
795 
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<210> 2 
<211> 264 
<212> PRT 
<213> Unknown 

5 

<220> 

<223> Neo from neomycin gene from Promega's pCI-neo. 



<400> 2 

lOMet He Glu Gin Asp Gly Leu His Ala Gly Ser Pro Ala Ala Trp Val 
15 10 15 

Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gin Gin Thr He Gly Cys Ser 

20 t 25 30 

Asp Ala Ala Val Phe Arg Leu Ser Ala Gin Gly Arg Pro Val Leu Phe 
15 35 40 45 

Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gin Asp Glu Ala 

50 55 60 

Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys Ala Ala Val 
65 70 75 80 

2 0Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu Leu Gly Glu 
85 90 95 

Val Pro Gly Gin Asp Leu Leu Ser Ser His Leu Ala Pro Ala Glu Lys 

100 105 110 

Val Ser He Met Ala Asp Ala Met Arg Arg Leu His Thr Leu Asp Pro 
25 115 120 125 

Ala Thr Cys Pro Phe Asp His Gin Ala Lys His Arg He Glu Arg Ala 

130 135 140 

Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gin Asp Asp Leu Asp Glu 
145 150 155 160 

30Glu His Gin Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg Leu Lys Ala 
165 170 175 

Arg Met Pro Asp Gly Glu Asp Leu Val Val Thr His Gly Asp Ala Cys 

180 185 190 

Leu Pro Asn He Met Val Glu Asn Gly Arg Phe Ser Gly Phe He Asp 
35 195 200 205 

Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gin Asp He Ala Leu Ala 

210 215 220 

Thr Arg Asp He Ala Glu Glu Leu Gly Gly Glu Trp Ala Asp Arg Phe 
225 230 235 240 

40Leu Val Leu Tyr Gly He Ala Ala Pro Asp Ser Gin Arg He Ala Phe 
245 250 255 

Tyr Arg Leu Leu Asp Glu Phe Phe 
260 
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<210> 3 
<211> 825 
<212> DNA 
5<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 



10<400> 3 



ccactcagtg 


gccaccatga 


tcgagcagga 


cggcctgcac 


gccggcagcc 


ccgccgcctg 


60 


ggtggagcgc 


ctgttcggct 


acgactgggc 


ccagcagacc 


atcggctgca 


gcgacgccgc 


120 


cgtgttccgc 


ctgagcgccc 


agggccgccc 


cgtgctgttc 


gtgaagaccg 


acctgagcgg 


180 


cgccctgaac 


gagctgcagg 


acgaggccgc 


ccgcctgagc 


tggctggcca 


ccaccggcgt 


240 


lSgccctgcgcc 


gccgtgctgg 


acgtggtgac 


cgaggccggc 


cgcgactggc 


tgctgctggg 


300 


cgaggtgccc 


ggccaggacc 


tgctgagcag 


ccacctggcc 


cccgccgaga 


aggtgagcat 


360 


catggccgac 


gccatgcgcc 


gcctgcacac 


cctggacccc 


gccacctgcc 


ccttcgacca 


420 


ccaggccaag 


caccgcatcg 


agcgcgcccg 


cacccgcatg gaggccggcc 


tggtggacca 


480 


ggacgacctg 


gacgaggagc 


accagggcct 


ggcccccgcc 


gagctgttcg 


cccgcctgaa 


540 


2 0ggcccgcatg 


cccgacggcg 


aggacctggt 


ggtgacccac 


ggcgacgcct 


gcctgcccaa 


600 


catcatggtg 


gagaacggcc 


gcttcagcgg 


cttcatcgac 


tgcggccgcc 


tgggcgtggc 


660 


cgaccgctac 


caggacatcg 


ccctggccac 


ccgcgacatc 


gccgaggagc 


tgggcggcga 


720 


9tgggccgac 


cgcttcctgg 


tgctgtacgg 


catcgccgcc 


cccgacagcc 


agcgcatcgc 


780 


cttctaccgc 


ctgctggacg 


agttcttcta 


ataaccagtc 


tctgg 




825 



25 

<210> 4 
<211> 825 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A synthetic construct. 
<400> 4 

35ccactcagtg gccaccatga tcgagcagga cggcctgcac gccggcagcc ccgccgcctg 60 
ggtggagcgc ctgttcggct acgactgggc ccagcagacc atcggctgca gcgacgccgc 120 
cgtgttccgc ctgagcgccc agggccgccc cgtgctgttc gtgaagaccg acctgagcgg 180 
cgccctgaac gagctgcagg acgaggccgc ccgcctgagc tggctggcca ccaccggcgt 240 
gccctgcgcc gccgtgctgg acgtggtgac cgaggccggc cgcgactggc tgctgctggg 300 

40cgaggtgccc ggccaggacc tgctgagcag ccacctggcc cccgccgaga aggtgagcat 360 
catggccgac gccatgcgcc gcctgcacac cctggacccc gccacctgcc ccttcgacca 420 
ccaggccaag caccgcatcg agcgcgcccg cacccgcatg gaggccggcc tggtggacca 480 
qqacqacctq qacqaqqaqc accaaqqcct aacccccacc aaactqttca cccocctaaa 54 0 
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ggcccgcatg cccgacggcg aggacctggt ggtgacccac ggcgacgcct gcctgcccaa 600 

catcatggtg gagaacggcc gcttcagcgg cttcatcgac tgcggccgcc tgggcgtggc 660 

cgaccgctac caggacatcg ccctggccac ccgcgacatc gccgaggagc tgggcggcga 72 0 

gtgggccgac cgcttcctgg tgctgtacgg catcgccgcc cccgacagcc agcgcatcgc 780 

Scttctaccgc ctgctggacg agttcttcta ataaccagtc tctgg 825 

<210> 5 

<211> 818 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<22 3> A synthetic construct. 



15<400> 5 



cctgcaggcc 


accatgatcg 


aacaagacgg 


cctccatgct 


ggcagtcccg 


cagcttgggt 


60 


cgaacgcttg 


ttcgggtacg 


actgggccca 


gcagaccatc 


ggatgtagcg 


atgcggccgt 


120 


gttccgtcta 


agcgctcaag 


gccggcccgt 


gctgttcgtg 


aagaccgacc 


tgagcggcgc 


180 


cctgaacgag 


cttcaagacg 


aggctgcccg 


cctgagctgg 


ctggccacca 


ccggtgtacc 


240 


2 0ctgcgccgct 


gtgttggatg 


ttgtgaccga 


agccggccgg 


gactggctgc 


tgctgggcga 


300 


ggtccctggc 


caggatctgc 


tgagcagcca 


ccttgccccc 


gctgagaagg 


tttccatcat 


360 


ggccgatgca 


atgcggcgcc 


tgcacaccct 


ggaccccgct 


acatgcccct 


tcgaccacca 


420 


ggctaagcat 


cggatcgagc 


gtgctcggac 


ccgcatggag gccggcctgg 


tggaccagga 


480 


cgacctggac 


gaggagcatc 


agggcctggc 


ccccgctgaa 


ctgttcgccc 


gcctgaaagc 


540 


25ccgcatgccg 


gacggtgagg 


acctggttgt 


gacacatggt 


gatgcctgcc 


tccctaacat 


600 


catggtcgag 


aatggccgct 


tctccggctt 


catcgactgc 


ggtcgcctag gagttgccga 


660 


ccgctaccag 


gacatcgccc 


tggccacccg 


cgacatcgct 


gaggagcttg gcggcgagtg 


720 


ggccgaccgc 


ttcttagtct 


tgtacggcat 


cgcagctccc 


gacagccagc 


gcatcgcctt 


780 


ctaccgcctg 


ctcgacgagt 


tcttttaatg 


agcttaag 






818 



30 

<210> 6 
<211> 1024 
<212> DNA 

<213> Escherichia coli 

35 

<400> 6 

atgaaaaagc ctgaactcac cgcgacgtct gtcgagaagt ttctgatcga aaagttcgac 60 
agcgtctccg acctgatgca gctctcggag ggcgaagaat ctcgtgcttt cagcttcgat 120 
gtaggagggc gtggatatgt cctgcgggta aatagctgcg ccgatggttt ctacaaagat 18 0 
4 0cgttatgttt atcggcactt tgcatcggcc gcgctcccga ttccggaagt gcttgacatt 240 
ggggaattca gcgagagcct gacctattgc atctcccgcc gtgcacaggg tgtcacgttg 300 
caagacctgc ctgaaaccga actgcccgct gttctgcagc cggtcgcgga ggccatggat 360 
QcaatCQCta cqqccqatct taqccaqacq aqcqqqttcq qcccattcqq accqcaaqqa 42 0 
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atcggtcaat acactacatg gcgtgatttc atatgcgcga ttgctgatcc ccatgtgtat 480 

cactggcaaa ctgtgatgga cgacaccgtc agtgcgtccg tcgcgcaggc tctcgatgag 540 

ctgatgcttt gggccgagga ctgccccgaa gtccggcacc tcgtgcacgc ggatttcggc 600 

tccaacaatg tcctgacgga caatggccgc ataacagcgg tcattgactg gagcgaggcg 660 

Satgttcgggg attcccaata cgaggtcgcc aacatcttct tctggaggcc gtggttggct 720 

tgtatggagc agcagacgcg ctacttcgag cggaggcatc cggagcttgc aggatcgccg 780 

cggctccggg cgtatatgct ccgcattggt cttgaccaac tctatcagag cttggttgac 840 

ggcaatttcg atgatgcagc ttgggcgcag ggtcgatgcg acgcaatcgt ccgatccgga 900 

gccgggactg tcgggcgtac acaaatcgcc cgcagaagcg cggccgtctg gaccgatggc 960 

lOtgtgtagaag tactcgccga tagtggaaac cgacgcccca gcactcgtcc gagggcaaag 102 0 

gaat 1024 



<210> 7 

<211> 341 

15<212> PRT 

<213> Escherichia coli 



<400> 7 

Met Lys Lys Pro 
20 1 

Glu Lys Phe Asp 
20 

Glu Ser Arg Ala 
35 

2 5Arg Val Asn Ser 
50 

Arg His Phe Ala 
65 

Gly Glu Phe Ser 

30 

Gly Val Thr Leu 
100 

Gin Pro Val Ala 
115 

35Gln Thr Ser Gly 
130 

Thr Thr Trp Arg 
145 

His Trp Gin Thr 

40 

Ala Leu Asp Glu 
180 



Glu Leu Thr Ala 
5 

Ser Val Ser Asp 

Phe Ser Phe Asp 
40 

Cys Ala Asp Gly 
55 

Ser Ala Ala Leu 
70 

Glu Ser Leu Thr 
85 

Gin Asp Leu Pro 

Glu Ala Met Asp 
120 

Phe Gly Pro Phe 
135 

Asp Phe lie Cys 
150 

Val Met Asp Asp 
165 

Leu Met Leu Trp 



Thr Ser Val Glu 
10 

Leu Met Gin Leu 
25 

Val Gly Gly Arg 

Phe Tyr Lys Asp 
60 

Pro lie Pro Glu 

° 75 
Tyr Cys lie Ser 
90 

Glu Thr Glu Leu 
105 

Ala He Ala Ala 

Gly Pro Gin Gly 
140 

Ala lie Ala Asp 
155 

Thr Val Ser Ala 
170 

Ala Glu Asp Cys 
185 



Lys Phe Leu He 
15 

Ser Glu Gly Glu 
30 

Gly Tyr Val Leu 
45 

Arg Tyr Val Tyr 

Val Leu Asp He 
80 

Arg Arg Ala Gin 
95 

Pro Ala Val Leu 
110 

Ala Asp Leu Ser 
125 

He Gly Gin Tyr 

Pro His Val Tyr 
160 

Ser Val Ala Gin 
175 

Pro Glu Val Arg 
190 
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His Leu Val His Ala Asp Phe Gly Ser Asn Asn Val Leu Thr Asp Asn 

195 200 205 

Gly Arg lie Thr Ala Val lie Asp Trp Ser Glu Ala Met Phe Gly Asp 
210 215 220 

5 Ser Gin Tyr Glu Val Ala Asn lie Phe Phe Trp Arg Pro Trp Leu Ala 
225 230 235 240 

Cys Met Glu Gin Gin Thr Arg Tyr Phe Glu Arg Arg His Pro Glu Leu 

245 250 255 

Ala Gly Ser Pro Arg Leu Arg Ala Tyr Met Leu Arg lie Gly Leu Asp 
10 260 265 270 

Gin Leu Tyr Gin Ser Leu Val Asp Gly Asn Phe Asp Asp Ala Ala Trp 

275 280 285 

Ala Gin Gly Arg Cys Asp Ala lie Val Arg Ser Gly Ala Gly Thr Val 
290 295 300 

15Gly Arg Thr Gin lie Ala Arg Arg Ser Ala Ala Val Trp Thr Asp Gly 
305 310 315 320 

Cys Val Glu Val Leu Ala Asp Ser Gly Asn Arg Arg Pro Ser Thr Arg 

325 330 335 

Pro Arg Ala Lys Glu 
20 340 



<210> 8 
<211> 1056 
<212> DNA 
25<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 



30<400> 8 

ccactcagtg gccaccatga agaagcccga 
gatcgagaag ttcgacagcg tgagcgacct 
cgccttcagc ttcgacgtgg gcggccgcgg 
cggcttctac aaggaccgct acgtgtaccg 

35cgaggtgctg gacatcggcg agttcagcga 
ccagggcgtg accctgcagg acctgcccga 
ggccgaggcc atggacgcca tcgccgccgc 
cttcggcccc cagggcatcg gccagtacac 
cgacccccac gtgtaccact ggcagaccgt 

40ccaggccctg gacgagctga tgctgtgggc 
gcacgccgac ttcggcagca acaacgtgct 
cgactggagc gaggccatgt tcggcgacag 
gcgcccctgg ctggcctgca tggagcagca 



gctgaccgcc 


accagcgtgg 


agaagttcct 


60 


gatgcagctg 


agcgagggcg 


aggagagccg 


120 


ctacgtgctg 


cgcgtgaaca 


gctgcgccga 


180 


ccacttcgcc 


agcgccgccc 


tgcccatccc 


240 


gagcctgacc 


tactgcatca 


gccgccgcgc 


300 


gaccgagctg 


cccgccgtgc 


tgcagcccgt 


360 


cgacctgagc 


cagaccagcg 


gcttcggccc 


420 


cacctggcgc 


gacttcatct 


gcgccatcgc 


480 


gatggacgac 


accgtgagcg 


ccagcgtggc 


540 


cgaggactgc 


cccgaggtgc 


gccacctggt 


600 


gaccgacaac 


ggccgcatca 


ccgccgtgat 


660 


ccagtacgag 


gtggccaaca 


tcttcttctg 


720 


gacccgctac 


ttcgagcqcc 


qccaccccqa 


780 
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gctggccggc agcccccgcc tgcgcgccta catgctgcgc atcggcctgg accagctgta 840 
ccagagcctg gtggacggca acttcgacga cgccgcctgg gcccagggcc gctgcgacgc 900 
catcgtgcgc agcggcgccg gcaccgtggg ccgcacccag atcgcccgcc gcagcgccgc 960 
cgtgtggacc gacggctgcg tggaggtgct ggccgacagc ggcaaccgcc gccccagcac 102 0 
Sccgcccccgc gccaaggagt aataaccagc tcttgg 1056 

<210> 9 
<211> 1056 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 



15<400> 9 



ccactccgtg 


gccaccatga 


agaagcccga 


gctgaccgct 


accagcgttg 


aaaaatttct 


60 


catcgagaag 


ttcgacagtg 


tgagcgacct 


gatgcagttg 


tcggagggcg 


aagagagccg 


120 


agccttcagc 


ttcgatgtcg 


gcggacgcgg 


ctatgtactg 


cgggtgaata gctgcgctga 


180 


tggcttctac 


aaagaccgct 


acgtgtaccg 


ccacttcgcc 


agcgctgcac 


tacccatccc 


240 


2 0cgaagtgttg 


gacatcggcg 


agttcagcga 


gagcctgaca 


tactgcatca 


gtagacgcgc 


300 


ccaaggcgtt 


actctccaag 


acctccccga 


aacagagctg 


cctgctgtgt 


tacagcctgt 


360 


cgccgaagct 


atggatgcta 


ttgccgccgc 


cgacctcagt 


caaaccagcg 


gcttcggccc 


420 


attcgggccc 


caaggcatcg 


gccagtacac 


aacctggcgg 


gatttcattt 


gcgccattgc 


480 


tgatccccat 


gtctaccact 


ggcagaccgt 


gatggacgac 


accgtgtccg 


ccagcgtagc 


540 


25tcaagccctg 


gacgaactga 


tgctgtgggc 


cgaagactgt 


cccgaggtgc 


gccacctcgt 


600 


ccatgccgac 


ttcggcagca 


acaacgtcct 


gaccgacaac 


ggccgcatca 


ccgccgtaat 


660 


cgactggtcc 


gaagctatgt 


tcggggacag 


tcagtacgag 


gtggccaaca 


tcttcttctg 


720 


gcggccctgg 


ctggcttgca 


tggagcagca 


gactcgctac 


ttcgagcgcc 


ggcatcccga 


780 


gctggccggc 


agccctcgtc 


tgcgagccta 


catgctgcgc 


atcggcctgg 


atcagctcta 


840 


30ccagagcctc 


gtggacggca 


acttcgacga 


tgctgcctgg 


gctcaaggcc 


gctgcgatgc 


900 


catcgtccgc 


agcggggccg 


gcaccgtcgg 


tcgcacacaa 


atcgctcgcc 


ggagcgccgc 


960 


cgtatggacc 


gacggctgcg 


tcgaggtgct 


ggccgacagc 


ggcaaccgcc 


ggcccagtac 


1020 


acgaccgcgc 


gctaaggagt 


agtaaccagg 


ctctgg 






1056 



35<210> 10 
<211> 1048 
<212> DNA 

<213> Artificial Sequence 



40<220> 

<223> A synthetic construct. 
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<400> 10 

cctgcaggcc accatgaaga agcccgagct gaccgctacc agcgttgaaa aatttctcat 60 
cgagaagttc gacagtgtga gcgacctgat gcagttgtcg gagggcgaag agagccgagc 12 0 
cttcagcttc gatgtcggcg gacgcggcta tgtactgcgg gtgaatagct gcgctgatgg 180 
Scttctacaaa gaccgctacg tgtaccgcca cttcgccagc gctgcactac ccatccccga 240 
agtgttggac atcggcgagt tcagcgagag cctgacatac tgcatcagta gacgcgccca 300 
aggcgttact ctccaagacc tccccgaaac agagctgcct gctgtgttac agcctgtcgc 360 
cgaagctatg gatgctattg ccgccgccga cctcagtcaa accagcggct tcggcccatt 420 
cgggccccaa ggcatcggcc agtacacaac ctggcgggat ttcatttgcg ccattgctga 4 80 

lOtccccatgtc taccactggc agaccgtgat ggacgacacc gtgtccgcca gcgtagctca 540 
agccctggac gaactgatgc tgtgggccga agactgtccc gaggtgcgcc acctcgtcca 600 
tgccgacttc ggcagcaaca acgtcctgac cgacaacggc cgcatcaccg ccgtaatcga 660 
ctggtccgaa gctatgttcg gggacagtca gtacgaggtg gccaacatct tcttctggcg 72 0 
gccctggctg gcttgcatgg agcagcagac tcgctacttc gagcgccggc atcccgagct 780 

lSggccggcagc cctcgtctgc gagcctacat gctgcgcatc ggcctggatc agctctacca 84 0 
gagcctcgtg gacggcaact tcgacgatgc tgcctgggct caaggccgct gcgatgccat 900 
cgtccgcagc ggggccggca ccgtcggtcg cacacaaatc gctcgccgga gcgccgccgt 960 
atggaccgac ggctgcgtcg aggtgctggc cgacagcggc aaccgccggc ccagtacacg 1020 
accgcgcgct aaggagtagt aacttaag 1048 

20 

<210> 11 
<211> 1174 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> A synthetic construct. 
<400> 11 

3 0ggatccgttt gcgtattggg cgctcttccg ctgatctgcg cagcaccatg gcctgaaata 60 
acctctgaaa gaggaacttg gttagctacc ttctgaggcg gaaagaacca gctgtggaat 120 
gtgtgtcagt tagggtgtgg aaagtcccca ggctccccag caggcagaag tatgcaaagc 180 
atgcatctca attagtcagc aaccaggtgt ggaaagtccc caggctcccc agcaggcaga 240 
agtatgcaaa gcatgcatct caattagtca gcaaccatag tcccgcccct aactccgccc 300 

3 5atcccgcccc taactccgcc cagttccgcc cattctccgc cccatggctg actaattttt 360 

tttatttatg cagaggccga ggccgcctct gcctctgagc tattccagaa gtagtgagga 420 
ggcttttttg gaggcctagg cttttgcaaa aagctcgatt cttctgacac tagcgccacc 480 
atgaccgagt acaagcctac cgtgcgcctg gccactcgcg atgatgtgcc ccgcgccgtc 540 
cgcactctgg ccgccgcttt cgccgactac cccgctaccc ggcacaccgt ggaccccgac 600 

4 0cggcacatcg agcgtgtgac agagttgcag gagctgttcc tgacccgcgt cgggctggac 660 

atcggcaagg tgtgggtagc cgacgacggc gcggccgtgg ccgtgtggac tacccccgag 720 
agcgttgagg ccggcgccgt gttcgccgag atcggccccc gaatggccga gctgagcggc 780 
agccgcctgg ccgcccagca gcaaatggag ggcctgcttg ccccccatcg tcccaaggag 84 0 
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cctgcctggt ttctggccac tgtaggagtg agccccgacc accagggcaa gggcttgggc 900 

agcgccgtcg tgttgcccgg cgtagaggcc gccgaacgcg ccggtgtgcc cgcctttctc 960 

gaaacaagcg caccaagaaa ccttccattc tacgagcgcc tgggcttcac cgtgaccgcc 1020 

gatgtcgagg tgcccgaggg acctaggacc tggtgtatga cacgaaaacc tggcgcctaa 1080 

Stgatctagaa ccggtcatgg ccgcaataaa atatctttat tttcattaca tctgtgtgtt 1140 

ggttttttgt gtgttcgaac tagatgctgt cgac 1174 

<210> 12 
<211> 1776 
10<212 > DNA 

<213> Artificial Sequence 

<220 > 

<223 > A synthetic construct. 

15 

<400> 12 

atggcttcca aggtgtacga ccccgagcaa cgcaaacgca tgatcactgg gcctcagtgg 60 
tgggctcgct gcaagcaaat gaacgtgctg gactccttca tcaactacta tgattccgag 120 
aagcacgccg agaacgccgt gatttttctg catggtaacg ctgcctccag ctacctgtgg 180 

2 0aggcacgtcg tgcctcacat cgagcccgtg gctagatgca tcatccctga tctgatcgga 240 
atgggtaagt ccggcaagag cgggaatggc tcatatcgcc tcctggatca ctacaagtac 300 
ctcaccgctt ggttcgagct gctgaacctt ccaaagaaaa tcatctttgt gggccacgac 360 
tggggggctt gtctggcctt tcactactcc tacgagcacc aagacaagat caaggccatc 420 
gtccatgctg agagtgtcgt ggacgtgatc gagtcctggg acgagtggcc tgacatcgag 4 80 

25gaggatatcg ccctgatcaa gagcgaagag ggcgagaaaa tggtgcttga gaataacttc 540 
ttcgtcgaga ccatgctccc aagcaagatc atgcggaaac tggagcctga ggagttcgct 600 
gcctacctgg agccattcaa ggagaagggc gaggttagac ggcctaccct ctcctggcct 660 
cgcgagatcc ctctcgttaa gggaggcaag cccgacgtcg tccagattgt ccgcaactac 720 
aacgcctacc ttcgggccag cgacgatctg cctaagatgt tcatcgagtc cgaccctggg 780 

30ttcttttcca acgctattgt cgagggagct aagaagttcc ctaacaccga gttcgtgaag 840 
gtgaagggcc tccacttcag ccaggaggac gctccagatg aaatgggtaa gtacatcaag 900 
agcttcgtgg agcgcgtgct gaagaacgag cagaccggtg gtgggagcgg aggtggcgga 960 
tcaggtggcg gaggctccgg agggattgaa caagatggat tgcacgcagg ttctccggcc 1020 
gcttgggtgg agaggctatt cggctatgac tgggcacaac agacaatcgg ctgctctgat 1080 

35gccgccgtgt tccggctgtc agcgcagggg cgcccggttc tttttgtcaa gaccgacctg 1140 
tccggtgccc tgaatgaact gcaggacgag gcagcgcggc tatcgtggct ggccacgacg 1200 
ggcgttcctt gcgcagctgt gctcgacgtt gtcactgaag cgggaaggga ctggctgcta 1260 
ttgggcgaag tgccggggca ggatctcctg tcatctcacc ttgctcctgc cgagaaagta 1320 
tccatcatgg ctgatgcaat gcggcggctg catacgcttg atccggctac ctgcccattc 1380 

40gaccaccaag cgaaacatcg catcgagcga gcacgtactc ggatggaagc cggtcttgtc 144 0 
gatcaggatg atctggacga agagcatcag gggctcgcgc cagccgaact gttcgccagg 1500 
ctcaaggcgc gcatgcccga cggcgaggat ctcgtcgtga cccatggcga tgcctgcttg 1560 
ccgaatatca tggtggaaaa tggccgcttt tctggattca tcgactgtgg ccggctgggt 1620 
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gtggcggacc gctatcagga catagcgttg gctacccgtg atattgctga agagcttggc 1680 
ggcgaatggg ctgaccgctt cctcgtgctt tacggtatcg ccgctcccga ttcgcagcgc 1740 
atcgccttct atcgccttct tgacgagttc ttctaa 1776 

5<210> 13 
<211> 1776 
<212> DNA 

<213> Artificial Sequence 
10<220> 

<223> A synthetic construct. 



<400> 13 



4~~ T *^ -4- +— /— * ^ ^ « 

dLydLtgaaC 


a o its +■ ri <*t a1*t" 

adya u^— 3 y d u u 


y i_. dv_y a.y y l. 


tctccggccg 


cttgggtgga 


gaggCtattC 


60 


J. o gg c t a. u y a. u t. 


y yy v^cja — » aaua 


naraal" pnnr 
y d-\_. d. aLLyyi, 


tgctctgatg 


ccgccgtgtt 


CCQCfCtQtCa 


120 


gcgcaggggc 


y n*cyy i— ^ 


t - 1" t* t* cr a a cr 


accgacctgt 


ccggtgccct 


aaatcfaacta 


180 




k-.ciy v — . y 


a t* ca fccrcrc fccf 


gccacgacgg gcgttccttg 


cqcacrctqtQ 


240 


t~i«y cii*y l> L>y 


L> V*> CI ^ t. ■! CL C*.*—^ W 


era era aci crcrac 


tggctgctat 


tgggcgaagt 


qcccrcfpcfcacf 


300 




V-oCL U> W I— • C* w O 


tachcctacc 


gagaaagtat 


ccatcatggc 


tgatgcaatg 


360 




aLd\«y v - u uy d 


t* c c ct cr c* t~ a 


tgcccattcg 


accaccaagc 


gaaacatcgc 


420 


ct l. v^y cty wy ciy 




cr a t - cicf a acr c c 


ggtcttgtcg 


atcaggatga 


tctggacgaa 


480 


y c^y v-. a. i. ^- a.y y 


nnrtrcrcarr 
y y i * 


agccgaactg 


ttcgccaggc 


tcaaggcgcg 


catgcccgac 


540 


ggcgaggatc 


tcgtcy uyoc 


r* f* S +~ f~T/~*f t~* t~t £3 t~ 

CCauyyugaU 


gcctgcttgc 


cgaatatcat 


ggtggaaaat 


600 


ggccgctttt 


ctggattcat 


cgactgtggc 


cggctgggtg 


tggcggaccg 


ctatcaggac 


660 


25atagcgttgg 


ctacccgtga 


tattgctgaa 


gagcttggcg gcgaatgggc 


tgaccgcttc 


720 


ctcgtgcttt 


acggtatcgc 


cgctcccgat 


tcgcagcgca 


tcgccttcta 


tcgccttctt 


780 


gacgagttct 


tcacccggtgg 


tgggagcgga 


ggtggcggat 


caggtggcgg 


aggctccgga 


840 


ggggcttcca 


aggtgfcacga 


ccccgagcaa 


cgcaaacgca 


tgatcactgg 


gcctcagtgg 


900 


tgggctcgct 


gcaagcaaat 


gaacgtgctg 


gactccttca 


tcaactacta 


tgattccgag 


960 


30aagcacgccg 


agaacgccgt 


gatttttctg 


catggtaacg 


ctgcctccag 


ctacctgtgg 


1020 


aggcacgtcg 


tgcctcacat 


cgagcccgtg 


gctagatgca 


tcatccctga 


tctgatcgga 


1080 


atgggtaagt 


ccggcaagag 


cgggaatggc 


tcatatcgcc 


tcctggatca 


ctacaagtac 


1140 


ctcaccgctt 


ggttcgagct 


gctgaacctt 


ccaaagaaaa 


tcatctttgt 


gggccacgac 


1200 


tggggggctt 


gtctggcctt 


tcactactcc 


tacgagcacc 


aagacaagat 


caaggccatc 


1260 


35gtccatgctg 


agagtgtcgt 


ggacgtgatc 


gagtcctggg 


acgagtggcc 


tgacatcgag 


1320 


gaggatatcg 


ccctgatcaa 


gagcgaagag 


ggcgagaaaa 


tggtgcttga 


gaataacttc 


1380 


ttcgtcgaga 


ccatgctccc 


aagcaagatc 


atgcggaaac 


tggagcctga 


ggagttcgct 


1440 


gcctacctgg 


agccattcaa 


ggagaagggc 


gaggttagac 


ggcctaccct 


ctcctggcct 


1500 


cgcgagatcc 


ctctc^ttaa 


gggaggcaag 


cccgacgtcg 


tccagattgt 


ccgcaactac 


1560 


40aacgcctacc 


ttcgggccag 


cgacgatctg 


cctaagatgt 


tcatcgagtc 


cgaccctggg 


1620 


ttcttttcca 


acgctattgt 


cgagggagct 


aagaagttcc 


ctaacaccga 


gttcgtgaag 


1680 


gtgaagggcc 


tccacttcag 


ccaggaggac 


gctccagatg aaatgggtaa gtacatcaag 


1740 


agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cagtaa 






1776 
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<210> 14 
<211> 1653 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 14 



lOatggccgatg 


ctaagaacat 


taagaagggc 


cctgctccct 


^ ~% n 4— ^ +- 

CCLdCCCLCC 


ggaggaegge 


en 
o u 


accgctggcg 


agcagctgca 


caaggccatg 


aagaggtatg 


cccuggcgcc 






gccttcaccg 


atgcccacat 


tgaggtggac 


atcacctatg 


ccgagtactt 


cgagatyucu 


±o\j 


gtgcgcctgg 


ccgaggccat 


gaagaggtac 


ggectgaaca 


ccaaccaccg 


caccgtggcg 


O A C\ 


tgctctgaga 


actctctgca 


gttcttcatg 


ccagtgctgg 


gcgccctgtt 


categgagtg 




15gccgtggccc 


ctgctaacga 


catttacaac 


gagegegage 


tgctgaacag 


catgggcatt 


O O \J 


tctcagccta 


ccgtggtgtt 


cgtgt ct a.ag 


aagggcctgc 


agaagatcct 


gaacgtgcag 


Aon 


aagaagctgc 


ctatcatcca 


gaagatc ate 


atcatggact 


ctaagaccga 


ctaccagggc 


a on 


ttccagagca 


tgtacacatt 


cgtgacatct 


catctgcctc 


ctggcttcaa 


cgagtacgac 


RAC\ 


ttcgtgccag 


agtctttcga 


cagggac aaa 


accattgccc 


tgatcatgaa 


cagctctggg 


ouu 


20tctaccggcc 


tgcctaaggg 


cgtggcc ctg 


r-* /-n ^ j-% «-\ ^ 

cctcatcgca 


ccgcctgtgt 


gcgcttctct 


oou 


cacgcccgcg 


accctatttt 


cggcaaccag 


atcatccccg 


acaccgctat 


tetgagegtg 


720 


gtgccattcc 


accacggctt 


cggcatgttc 


accaccctgg 


gctacctgat 


ttgcggcttt 


780 


cgggtggtgc 


tgatgtaccg 


cttcgaggag 


gagctgttcc 


tgcgcagcct 


gcaagactac 


840 


aaaattcagt 


ctgccctgct 


ggtgccaacc 


ctgttcagct 


tettegctaa 


gagcaccctg 


900 


25atcgacaagt 


acgacctgtc 


taacctgcac 


gagattgect 


ctggcggcgc 


cccactgtct 


960 


aaggaggtgg 


gcgaagccgt 


ggccaagcgc 


tttcatctgc 


caggcatccg 


ccagggctac 


1020 


ggcctgaccg 


agacaaccag 


cgccattctg 


attaccccag 


agggegaega 


caagcctggc 


1080 


gccgtgggca 


aggtggtgcc 


attcttcgag 


gccaaggtgg 


tggacctgga 


caccggcaag 


1140 


accctgggag 


tgaaccagcg 


eggegagctg 


tgtgtgcgcg 


gecctatgat 


tatgtcegge 


1200 


30tacgtgaata 


accctgaggc 


cacaaacgcc 


ctgatcgaca 


aggaeggctg gctgcactct 


1260 


ggcgacattg 


cctactggga 


cgaggacgag 


cacttcttca 


tcgtggaccg 


cctgaagtct 


1320 


ctgatcaagt 


acaagggcta 


ccaggtgcjcc 


ccagccgagc 


tggagtctat 


cctgctgcag 


1380 


caccctaaca 


ttttcgacgc 


cggagtggcc 


ggcctgcccg 


aegacgatge 


eggegagctg 


1440 


cctgccgccg 


tcgtcgtgct 


ggaacaegge 


aagaccatga 


ccgagaagga gatcgtggac 


1500 


35tatgtggcca 


gccaggtgac 


aaccgccaag 


aagctgcgcg 


gcggagtggt 


gttcgtggac 


1560 


gaggtgccca 


agggcctgac 


eggcaagctg 


gacgcccgca 


agatccgega gatcctgatc 


1620 


aaggctaaga 


aaggcggcaa 


gategcegtg 


taa 






1653 



<210> 15 
40<211> 597 
<212> DNA 

<213> Streptomyces sp. 



WO 2006/034061 



PCT/US2005/033218 



12 

<400> 15 

atgaccgagt acaagcccac ggtgcgcctc gccacccgcg acgacgtccc ccgggccgta 60 
cgcaccctcg ccgccgcgtt cgccgactac cccgccacgc gccacaccgt cgacccggac 120 
cgccacatcg agcgggtcac cgagctgcaa gaactcttcc tcacgcgcgt cgggctcgac 180 
Satcggcaagg tgtgggtcgc ggacgacggc gccgcggtgg cggtctggac cacgccggag 240 
agcgtcgaag cgggggcggt gttcgccgag atcggcccgc cjcatggccga gttgagcggt 300 
tcccggctgg ccgcgcagca acagatggaa ggcctcctgg cgccgcaccg gcccaaggag 360 
cccgcgtggt tcctggccac cgtcggcgtg tcgcccgacc accagggcaa gggtctgggc 420 
agcgccgtcg tgctccccgg agtggaggcg gccgagcgcg ccggggtgcc cgccttcctg 480 
lOgagacctccg cgccccgcaa cctccccttc tacgagcggc tcggcttcac cgtcaccgcc 540 
gacgtcgagg tgcccgaagg accgcgcacc tggtgcatga cccgcaagcc cggtgcc 597 



<210> 16 

<211> 1672 

15<212> DNA 

<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 

20 

<400> 16 

aaagccacca tggaggacgc caagaacatc aagaagggcc ccgccccctt ctaccccctg 60 

gaggacggca ccgccggcga gcagctgcac aaggccatga agcgctacgc cctggtgccc 120 

ggcaccatcg ccttcaccga cgcccacatc gaggtggaca -tcacctacgc cgagtacttc 180 

25gagatgagcg tgcgcctggc cgaggccatg aagcgctacg gcctgaacac caaccaccgc 240 

atcgtggtgt gcagcgagaa cagcctgcag ttcttcatgc ccgtgctggg cgccctgttc 300 

atcggcgtgg ccgtggcccc cgccaacgac atctacaacg agcgcgagct gctgaacagc 360 

atgggcatca gccagcccac cgtggtgttc gtgagcaaga agggcctgca gaagatcctg 42 0 

aacgtgcaga agaagctgcc catcatccag aagatcatca tcatggacag caagaccgac 4 80 

30taccagggct tccagagcat gtacaccttc gtgaccagcc acctgccccc cggcttcaac 540 

gagtacgact tcgtgcccga gagcttcgac cgcgacaaga ccatcgccct gatcatgaac 600 

agcagcggca gcaccggcct gcccaagggc gtggccctgc cccaccgcac cgcctgcgtg 660 

cgcttcagcc acgcccgcga ccccatcttc ggcaaccaga tcatccccga caccgccatc 720 

ctgagcgtgg tgcccttcca ccacggcttc ggcatgttca ccaccctggg ctacctgatc 780 

35tgcggcttcc gcgtggtgct gatgtaccgc ttcgaggagg agctgttcct gcgcagcctg 840 

caggactaca agatccagag cgccctgctg gtgcccaccc tgttcagctt cttcgccaag 900 

agcaccctga tcgacaagta cgacctgagc aacctgcacg agatcgccag cggcggcgcc 960 

cccctgagca aggaggtggg cgaggccgtg gccaagcgct tccacctgcc cggcatccgc 1020 

cagggctacg gcctgaccga gaccaccagc gccatcctga tcacccccga gggcgacgac 1080 

40aagcccggcg ccgtgggcaa ggtggtgccc ttcttcgagg ccaaggtggt ggacctggac 1140 

accggcaaga ccctgggcgt gaaccagcgc ggcgagctgt gcgtgcgcgg ccccatgatc 1200 

atgagcggct acgtgaacaa ccccgaggcc accaacgccc tgatcgacaa ggacggctgg 1260 

ctgcacagcg gcgacatcgc ctactgggac gaggacgagc acttcttcat cgtggaccgc 1320 
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ctgaagagcc tgatcaagta caagggctac 

ctgctgcagc accccaacat cttcgacgcc 

ggcgagctgc ccgccgccgt ggtggtgctg 

atcgtggact acgtggccag ccaggtgacc 

Sttcgtggacg aggtgcccaa gggcctgacc 

atcctgatca aggccaagaa gggcggcaag 

<210> 17 

<211> 1672 

10<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

15 

<400> 17 

aaagccacca tggaggacgc caagaacatc 
gaggacggca ccgccggcga gcagctgcac 
ggcaccatcg ccttcaccga cgcacatatc 

20gagatgagcg ttcggctggc agaggctatg 
atcgtggtgt gcagcgagaa cagcttgcag 
atcggcgtgg ctgtggcccc agctaacgac 
atgggcatca gccagcccac cgtcgtattc 
aacgtgcaaa agaagctgcc catcatccaa 

25taccagggct tccaaagcat gtacaccttc 
gagtacgact tcgtgcccga gagcttcgac 
agtagtggca gtaccggctt acctaagggc 
cgattcagtc atgcccgcga ccccatcttc 
ctgagcgtgg tgccatttca ccacggcttc 

3 0tgcggcttcc gggtcgtgct gatgtaccgc 

caagactaca agattcaaag cgccctgctg 
agcaccctga tcgacaagta cgacctgagc 
ccgctcagca aggaggtggg cgaggccgtg 
cagggctacg gcctgaccga gacaaccagc 
35aagcctggcg cagtaggcaa ggtggtgccc 
accggtaaaa ccctgggtgt gaaccagcgc 
atgagcggct acgttaacaa ccccgaggct 
ctgcacagcg gcgacatcgc ctactgggac 
ctgaagagcc tgatcaaata caagggctac 

4 0ctgctgcagc accccaacat cttcgacgcc 

ggcgagctgc ccgccgcagt cgtggtgctg 
atcgtggact atgtggccag ccaggttaca 
ttcgtggacg aggtgcctaa aggcctgacg 



13 



caggtggccc 


ccgccgagct 


gg/agagcatc 


1380 


ggcgtggccg 


gcctgcccga 


cgacgacgcc 


1440 


gagcacggca 


agaccatgac 


ccjagaaggag 


1500 


accgccaaga 


aqctqcqcqg 


cg-cjcgtggtg 


1560 


ggcaagctgg 


acgcccgcaa 


gatccgcgag 


1620 


atcgccgtgt 


aataattcta 


ga. 


1672 


aagaagggcc 


cagcgccatt 


cfc accccctg 


60 


aaggccatga 


agcgctacgc 


cc tggtgccc 


120 


ci a crcr t crq a c a 


tcacctacgc 


ccjagtacttc 


180 


aaacqctatq 


ggctgaacac 


caaccatcgc 


240 


ttcttcatgc 


ccgtgttggg 


tcjccctgttc 


300 


atctacaacg 


aqcQcqacrct 


gc tgaacagc 


360 


gtgagcaaga 


aagggctgca 


aa.agatcctg 


420 


aagatcatca 


tcatggacag 


ca.agaccgac 


480 


gtgaccagcc 


atttgccgcc 


ccjgcttcaac 


540 


cgcgacaaga 


ccatcgccct 


ga.tcatgaac 


600 


gtggccctac 


cgcaccgcac 


cgcctgtgtc 


660 


qgcaaccaqa 


tcatccccga 


caccgctatc 


720 


ggcatgttca 


ccaccctggg 


ct acttgatc 


780 


ttcqaqqaqq 


agctattctt 


qccrcaqcttq 


840 


gtgcccaccc 


tgttcagttt 


cttcgccaag 


900 


aacctgcacg 


agatcgccag 


ccxgcgqcqcc 


960 


gccaagcgct 


tccacctgcc 


acjgcatccgc 


1020 


gccattctga 


tcacccccga 


qcrgggacqac 


1080 


ttcttcgagg 


ctaaggtggt 


gcjacctggac 


1140 


yy >-y «.y <- <-y <- 


arcrtccatoa 
y ^y 


cc ccatgatc 


1200 


acaaacgccc 


tgatcgacaa 


ggacggctgg 


1260 


gaggacgagc 


acttcttcat 


cgjtggaccgg 


1320 


caggtagccc 


cagccgaact 


ggagagcatc 


1380 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


1440 


gagcacggta 


aaaccatgac 


ccjagaaggag 


1500 


accgccaaga 


agctgcgcgg 


cggcgtggtg 


1560 


ggcaagttgg 


acgcccgcaa 


gatccgcgag 


1620 
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attctgatca aggccaagaa gggcggcaag atcgccgtgt aataattcta ga 1672 

<210> 18 
<211> 1672 
5<212> DNA 
<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

10 

<400> 18 



aaagc caeca 


tggaagatgc 


caaaaacatt 


aagaagggee 


cagcgccatt 


ctacccactg 


oU 


y ayy acyy ca 


ccgccggcga 


gcagctgcac 


s a a /t ~» 4* 

adayCCaCyd 


ayCyCtaCyC 


cccyyuyccc 


IDA 


yy caeca ccy 


cctttaccga 


cgcacatatc 


gaggtggaca 


tcacctacgc 


cgagtacttc 


1 OA 

loU 


loyayauyagcy 


ttcggctggc 


agaggc tatg 


aagcgctatg 


ggctgaatac 


caaccatcgc 


O A A 


atcgtggugt: 


geagegagaa 


tagc t tgcag 


ttc t teatge 


ccgtgttggg 


tgccctgttc 


O A A 


at-Cygtytyy 


c cgcggcccc 


agcuaacgac 


acccacaacg 


agegegage t 


gctgaacagc 


ioO 


a.Cy y y CatCa. 


yCCciyCCCaC 


4- r*rt 4* ^ 4* 4* ^» 

cy ccg tatcc 


gtgagcaaga 


aagggctgea 


aaagatcctc 


/IDA 


a a /"i (™f 4"/"*^ a a a 

aauy tyCaaa 


a /*f a a 4* a /■< /"» 
ayaayC UdCC 


«a 4*/"*a4*a/"»aa 

ya.ccaca.caa. 


a.ayaccacca 


Lcatygatay 


caayaccyac 


A Q A 


O A 4" a ^ /™t « /T /t/t 4* 

j6 u UaLCayyyCL 


4* /t a a a /t /^-i -a +- 

CCCaaayCat. 


y cacacc etc 


gtgaccagcc 


at t tgccacc 


cggcttcaac 


C/l A 


gagtacgact 


tcgtgcccga 


gagcttcgac 


egggacaaaa 


ccatcgccct 


gatcatgaac 


600 


agtagtggca 


gtaceggatt 


gcccaagggc 


gtagccctac 


cgcaccgcac 


cgcctgtgtc 


660 


cgattcagtc 


atgcccgcga 


ccccatcttc 


ggcaaccaga 


tcatccccga 


caccgctatc 


720 


ctcagcgtgg 


tgecatttea 


ccacggcttc 


ggcatgttca 


ccacgctggg 


ctacttgatc 


780 


2 5tgcggctttc 


gggtcgtgct 


catgtaccgc 


ttcgaggagg 


agctattctt 


gcgcagcttg 


840 


caagactata 


agattcaaag 


cgccctgctg 


gtgcccacac 


tgttcagctt 


cttcgccaag 


900 


agcactctca 


tcgacaagta 


cgacctgagc 


aacctgcacg 


agatcgecag 


eggeggggeg 


960 


ccgctcagca 


aggaggtggg 


egaggcegtg 


gccaagcgct 


tccacctacc 


aggcatccgc 


102 0 


cagggctacg 


gectgacaga 


aacaaccagc 


gecattctga 


tcacccccga 


aggggacgac 


1080 


30aagcctggcg 


cagtaggcaa 


ggtggtgccc 


ttcttcgagg 


ctaaggtggt 


ggacttggac 


114 0 


aceggtaaga 


ccctgggtgt 


gaaccagcgc 


ggcgagctgt 


gcgtccgtgg 


ccccatgatc 


1200 


atgagegget 


acgttaacaa 


ccccgaggct 


acaaacgctc 


tcatcgacaa 


ggacggctgg 


1260 


ctgcacagcg 


gcgacatcgc 


ctactgggac 


gaggacgagc 


acttcttcat 


cgtggaccgg 


132 0 


ctgaagagee 


tgatcaaata 


caagggctac 


caggtagccc 


cagccgaact 


ggagagcatc 


1380 


35ctgctgcaac 


accccaacat 


cttcgacgcc 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


144 0 


ggcgagctgc 


ccgccgcagt 


cgtcgtgctg 


gagcaeggta 


aaaccatgac 


cgagaaggag 


1500 


atcgtggact 


atgtggccag 


ccaggttaca 


accgccaaga 


agetgegegg 


tggtgttgtg 


1560 


ttcgtggacg 


aggtgcctaa 


aggectgacg 


ggcaagttgg 


acgcccgcaa 


gatccgegag 


162 0 


attctcatta 


aggccaagaa 


gggcggcaag 


atcgccgtgt 


aataattcta 


ga 


1672 
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<210> 19 
<211> 1672 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 19 



T f\ a a a a a 
XUctactyv_V_Clv_V_ci 


t g g a e±y el i. y t. 


v_ctctctclctv_ct L, L_ 


a ana a nnnr* c* 
ct cty ctcty y y v_ v_ 


\_>cLy v—>y v_ v_. ci i_ t_> 


V_L.CtV_V_V_CtV_L.l_ 


60 




ccyttyy tya 


y i-cty (_ tyudv/ 


a a a rrr* paf - rra 
ctctcty i_.i_.ct i_.y ct 


anr , rTP , +" a prrp 
cty Ly vv u ctv_y v_ 


i_v_.t_yyv,yi_L.v_. 


120 




r , f , t"'t - t - afT 1 oa 
LtLL.L-av^L>ya 


V_yV_ctV_CtV_.Ctl_.l_ 


y ctyy L.yyctu>ct 


hfapp^appjp 
i_ i_ ct v_ v_ u ct v_ y v_ 


r>rra rrf" af't't'P 
v_y cty v_.civ_l.v_.v_ 


180 


yagatgagcg 


t ttyyt tyyt 


cty ct cty v_ L-ci uy 


ctcty Ly l LaLy 


ggctgaacac 


caaccatcgc 




ct L,t_y uyy uy u 


y w ciy v-y ciy dd 


l. cty t_ i_ L,y v_ cty 


^~t - f , i - 't - ^ , at - r , TP , 

LLtLLLaLy U 


ccgtgttggg 


tgccctgttc 




1 c a ^ prf rr^ n ^ rtrr 

iOdt.cyytgt.yy 


l. uy tyyt^LL 


cty v_ l_ ctctv_y ctv_ 


ct L.L. L-Ctv_ ctclv_y 


agegegaget 


gctgaacagc 




ctL.yyyi_ctL.i_.ci 


yv_.L-.ctyt_.v_v_.ctv_ 


/~i /~» 4- «»i prf~ a t" t" p 
V_yi_v_yi_CtL.L.L. 


prt - <~r/^ a ana 
y v_.yctyt_.ctciyct 


aagggctgea 


aaagatcctc 


420 


clctl_.y L.y V_ dct ct 


ana arfrhapp 

ciy ct cty v_ l. ctv_ v_ 


ycti_.i_.ct l. ctv_ ctct 


aanat*rat*pa 
ctctyct t_.v_ctv_.v_ct 


tcatggatag 


caagaccgac 




t_.ctv_.v_. cty yy l. l 


i_ v_ v— ctctcty v_ ct L- 


fYt*ar , ai"*f- , t"t"r' 

y L- CI V_ CI V_ V_ t_.l_.v_. 


of* era ri*hppp 

y v_y CIV_ L. v_ v_ V_ v_ 


atttgccacc 


cggcttcaac 


540 


y cty i_ctt_y cxt— L. 


t-v-y uy tv-ty a. 


nanpht" pns r~ 
y cty v_ l_ Lt^y ctv_. 


cctnazi fa a a a 
v_yyyctv_ctctctci 


ccatcgccct 


gatcatgaac 


D W v 


5 0 acrt~ aahaana 


at* a rrocra t* t* 

y iwUv*vvVjy o. i— i_ 


vjuwUduM vj v_ 


at" a crrr'r'f" a r* 

^-j V_ CLVJ V_ V_ V_ WCIW 


cgcaccgcac 


cgcttgtgtc 


660 


cgattcagtc 


atgcccgcga 


ccccatcttc 


ggcaaccaga 


tcatccccga 


caccgctatc 


720 


ctcagcgtgg 


tgecatttea 


ccacggcttc 


ggcatgttca 


ccacgctggg 


ctacttgatc 


780 


tgcggctttc 


gggtcgtgct 


catgtaccgc 


ttcgaggagg 


agctattctt 


gcgcagcttg 


840 


caagactata 


agattcaaag 


cgccctgctg 


gtgcccacac 


tgttcagttt 


cttcgccaag 


900 


25agcactctca 


tcgacaagta 


cgacctaagc 


aacttgeacg 


agatcgecag 


eggeggggeg 


960 


ccgctcagca 


aggaggtggg 


egaggcegtg 


gccaaacgct 


tccacctacc 


aggcatccgc 


1020 


cagggctacg 


gectgacaga 


aacaaccagc 


gecattctga 


tcacccccga 


aggggacgac 


1080 


aagcctggcg 


cagtaggcaa 


ggtggtgccc 


ttcttcgagg 


ctaaggtggt 


ggacttggac 


1140 


accggtaaga 


cactgggtgt 


gaaccagcgc 


ggcgagctgt 


gcgtccgtgg ccccatgatc 


1200 


3 0atgagcggct 


acgttaacaa 


ccccgaggct 


acaaacgctc 


tcatcgacaa ggacggctgg 


1260 


ctgcacagcg 


gcgacatcgc 


ctactgggac 


gaggacgagc 


acttcttcat 


cgtggaccgg 


1320 


ctgaagagcc 


tgatcaaata 


caagggctac 


. caggtagccc 


cagccgaact 


ggagagcatc 


1380 


ctgctgcaac 


accccaacat 


cttcgacgcc 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


1440 


ggcgagctgc 


ccgccgcagt 


cgtcgtgctg 


gaacacggta 


aaaccatgac 


cgagaaggag 


1500 


3 5atcgtggact 


atgtggccag 


ccaggttaca 


accgccaaga 


agetgegegg tggtgttgtg 


1560 


ttcgtggacg 


aggtgcctaa 


aggectgacg 


ggcaagttgg 


acgcccgcaa gatccgegag 


1620 


attctcatta 


aggecaagaa 


gggcggcaag 


atcgccgtgt 


aataattcta 


ga 


1672 



<210> 20 
40<211> 1672 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> A synthetic construct. 
<400> 20 

5aaagccacca tggaagatgc caaaaacatt 
gaagacggca ccgccggcga gcagctgcac 
ggcaccatcg cctttaccga cgcacatatc 
gagatgagcg ttcggctggc agaagctatg 
atcgtggtgt gcagcgagaa tagcttgcag 

lOatcggtgtgg ctgtggcccc agctaacgac 
atgggcatca gccagcccac cgtcgtattc 
aacgtgcaaa agaagctacc gatcatacaa 
taccagggct tccaaagcat gtacaccttc 
gagtacgact tcgtgcccga gagcttcgac 

ISagtagtggca gtaccggatt gcccaagggc 
cgattcagtc atgcccgcga ccccatcttc 
ctcagcgtgg tgccatttca ccacggcttc 
tgcggctttc gggtcgtgct catgtaccgc 
caagactata agattcaaag cgccctgctg 

2 0agcactctca tcgacaagta cgacctaagc 

ccgctcagca aggaggtggg cgaggccgtg 
cagggctacg gcctgacaga aacaaccagc 
aagcctggcg cagtaggcaa ggtggtgccc 
accggtaaga cactgggtgt gaaccagcgc 
25atgagcggct acgttaacaa ccccgaggct 
ctgcacagcg gcgacatcgc ctactgggac 
ctgaagagcc tgatcaaata caagggctac 
ctgctgcaac accccaacat cttcgacgcc 
ggcgagctgc ccgccgcagt cgtcgtgctg 

3 0atcgtggact atgtggccag ccaggttaca 

ttcgtggacg aggtgcctaa aggcctgacg 
attctcatta aggccaagaa gggcggcaag 

<210> 21 
35<211> 1672 
<212> DNA 

<213> Artificial Sequence 
<220> 

40<223> A synthetic construct. 
<400> 21 

aaaqccacca tqqaagatgc caaaaacatt 



16 



aagaagggcc 


cagcgccatt 




60 


aaagccatga 


agcgctacgc 


c t* It ctrs t" ct c f r* 




gaggtggaca 


ttacctacgc 


t*y ay LaLL 


i an 

J. O VJ 


aagcgctatg ggctgaacac 




j£ *± U 


ttcttcatgc 


ccgtgttggg 


t*n^*•^ , ^~•t*f , f'^"'^ - ^*• 
Lyi<UUL^ L- l_- 




atctacaacg 


agcgcgagct 


gctyaaCayc 


J D w 


gtgagcaaga 


aagggctgca 


s a a rra ^ r* r* 
cta-CtyctL-v.-l-. U. 


r± Z, \J 


aagatcatca 


tcatggatag 


rt a a /-tpf a 
CaaydCCgaC 


a q n 

*i O U 


gtgacttccc 


atttgccacc 


Oyy L. LL-CtClvJ 




cgggacaaaa 


ccatcgccct 


trial" fr a o /*• 

ydL.^ctLyctctv_r 


0\J\J 


gtagccctac 


cgcaccgcac 


CyCtLytytC 


DDI) 


ggcaaccaga 


tcatccccga 


caccgctatc 


/ £> U 


ggcatgttca 


ccacgctggg 


f t* a /™« t- t* rra t~ /-< 
CtaCttyatC 


/ou 


ttcgaggagg 


agctattctt 


q /™i 4— 

y cycciy t* l. t.y 


fl A fl 

OTtV 


gtgcccacac 


tgttcagttt 


Cttcyctaay 




aacttgcacg 


agatcgccag 


cggcggggcg 


?ou 


gccaaacgct 


tccacctacc 


aggcatccgc 


1020 


gccattctga 


tcacccccga 


aggggacgac 


1080 


ttcttcgagg 


ctaaggtggt 


ggacttggac 


1140 


ggcgagctgt gcgtccgtgg 


ccccatgatc 


1200 


acaaacgctc 


tcatcgacaa 


ggacggctgg 


1260 


gaggacgagc 


acttcttcat 


cgtggaccgg 


1320 


caggtagccc 


cagccgaact 


ggagagcatc 


1380 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


1440 


gaacacggta 


aaaccatgac 


cgagaaggag 


1500 


accgccaaga 


agctgcgcgg 


tggtgttgtg 


1560 


ggcaagttgg 


acgcccgcaa 


gatccgcgag 


1620 


atcgccgtgt 


aataattcta 


ga 


1672 



aaqaagggcc cagcgccatt ctacccactc 60 
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py a ana p'prprp* a 
y dctyciL^yy^— d 




y i*cty \-> L ClL, 


a a P, PIP* PP. t" (13 

CL CI CLy \_r <xCl U^jCl 


=a rrpTiP" t~ o rt/-rrt 
ciy v^.yv_ i_cLV»»y\_ 


r^Y' ptpi t* ptp" p* p 
i-L- 1— y y uy ull 


12 0 


prpjp*a rfa tr P 1 pt 


v_» k_. L. L- l. ci v.. t_y d 


p , PTP*a pflhaiT 


□a aa fccraac a 

3 33 33 




L.y ciy uaL l. l- v.. 


180 






a pra a np t* at"n 
d.y ct ciy ^uaug 


ciciy^y^L.ciL.y 


yyv* L.y etc* l. 


aaarnalr PiPf 
dctctL.v^ct LLyy 


240 


ciucy uyy tyt 


rrfia pip* pi a na 3 
y L^cty ^*y **y 


ucty l» u. L.y L.cty 


f* t* n t" t* f a t*np 


n rtrrt* nt" t - nnn 
v-.y L.y uy y y 


4-rtrtrtrt4-rt4-4 | -y--i 
LyLLLLyULL 


00 


C *i ^ 4— ri 4- /"t**t 


uy uyyuLLt« 


anp^aapna p* 
ay L> ucicn^y aL 


flhphapaapfl 

d ut- ucLOciciL^y 


y y y y ^ ^* 


rti— 1 4- rt; -3 -a parrp 
yLLydaLdyL 


*^fiO 
j v U 


* 4— nr*ff n fa 4~ ^ 

dLyyyCaLCa 


*"r/"* rt a cry* rt /-t a /*i 

yLLayLwLdL 


/i/t4~ rt /-r4~ a 4~ rt 
t*y L,L.y LattL 


y uy ciy ctcty ct 


ctcty y yt-. igta 


a a a +" rt rt 4™ rt 

dadgatcctc 


^ U 


?a fa /~* 4— r^r* fa a. fa 

aaCgtyCaaa 


agaagctacc 


gatcatacaa 


•a ra/Tia trnatra 
adgaLLaLLa 


/-i ^3 4— rtrrt; fa a rt? 

i_L.ciL.yyciL.cty 


rt a arta rt rt/^fa rt 

caagaccgac 


ft o u 


taccagggct 


t ccaaagcat 


/^W 4— ^ /"I *~| M n if— f~% 

gUaCaCCCtC 


f< 1 fa ^ -4" rt rt rt 

yugacttccc 


4— 4~ /~-r /™i rt fa rt 

aCtCyCCdCC 


rt rtrrtr rt 4~* 4" rt fa rt 

cyyCLLCdac 


^ a n 


yagtacgact 


ucgcgcccga 


/Tfa /**rrt 4- t* rtrtr fa rt 

gagcuccgac 


rt/*r/"*rt a naaaa 

CgggaCaoaa 


rt rt "3 4— rt/™f rt rt rt 

ccatcgcccL 


yrfa 4~ rt fa 4" rt; fa fa rt 

gaucaLyaac 




xuagtagcggca 


gtaccggacc 


gcccaagggc 


**rt- fx/t rt rt "Ti rt 

gCagccccau 


rt rt -i rtrt/^rrt a rt 
CgCaCCgCaC 


rtrtfrt 4~ f* rt-4— rt" 4* rt 

cgctcgLgcc 


bou 


cgattcagtc 


augcccgcga 


ccccatcccc 


^rrt? rt f^ f^ rt rt f^ rt7 fa 

ggcaaccaga 


4™ rt fa 4)* rt ^* rt /"t ^ 

ucacccccga 


rt fa rt rt rtfrt 4" f! 4** rt^ 

caccgCLdUC 


"7 o n 


ctcagcgtgg 


t.gccacti tea 


ccacggcccc 


rt: rt? rt ^ 4~ rt" 4— rt fa 

ggCdLyLLCd 


rt rt fa /*^r +- rtrrt' rt; 

ccacgctygg 


rt 4™ fa rt 4"* rt; fa 4* rt 

cuaccuyaLC 


n q n 
/ o u 


tgcggctt tc 


gggtcgtgct 


catgtaccgc 


ttcgaggagg 


agctattctt 


gcgcagcttg 


Q /t p» 


ca.agacta.ta 


agattcaaag 


cgccctgctg 


gtgcccacac 


4-rt4-'t-rt^-irt4-4-4- 

ugutcagect: 


ettegctaag 


Q Pi Pi 

you 


15agcactctca 


tcgacaagta 


cgacctaagc 


aacttgeacg 


agatcgecag 


eggeggggeg 


you 


ccgctcagca 


aggaggtagg 


tgaggccgtg 


gccaaacgct 


tccacctacc 


aggcatccgc 




cagggctacg 


gec tgacaga 


aacaaccagc 


gecattctga 


tcacccccga 




1 Pi O P» 
XU o U 


aagcctggcg 


cagtaggcaa 


ggtggtgccc 


ttcttcgagg 


c t aaggt ggt 


ggacttggac 


J. JL4 U 


accgguaaga 


cacugggtgt 


gaaccagcgc 


ggegagcuge 


gcgtccgtgg 


ccccatgatc 


u u 


2 0atgagcggct 


acgttaacaa 


ccccgaggct 


acaaacgctc 


tcatcgacaa 


ggacggctgg 


1260 


ctgcacagcg 


gcgacatcgc 


ctactgggac 


gaggacgagc 


acttcttcat 


cgtggaccgg 


1320 


ctgaagagcc 


tgatcaaata 


caagggctac 


caggtagccc 


cagccgaact 


ggagagcatc 


1380 


ctgctgcaac 


accccaacat 


cttcgacgcc 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


1440 


ggcgagctgc 


ccgccgcagt 


cgtcgtgctg 


gaacacggta 


aaaccatgac 


cgagaaggag 


1500 


25atcgtggact 


atgtggccag 


ccaggttaca 


accgccaaga 


agetgegegg 


tggtgttgtg 


1560 


ttcgtggacg 


aggtgcctaa 


aggectgacg 


ggcaagttgg 


acgcccgcaa 


gatccgegag 


1620 


attctcatta 


aggecaagaa 


gggcggcaag 


atcgccgtgt 


aataattcta 


ga 


1672 


<210> 22 














30<211> 1672 














<212> DNA 














<213> Artificial Sequence 










<220> 














35<223> A synthetic construct. 










<400> 22 














aaagccacca 


tggaagatgc 


caaaaacatt 


aagaagggee 


cagcgccatt 


ctacccactc 


60 


gaagacggga 


ccgccggcga 


gcagctgcac 


aaagccatga 


agcgctacgc 


cctggtgccc 


120 


40ggcaccatcg 


cctttaccga 


cgcacatatc 


gaggtggaca 


ttacctacgc 


cgagtacttc 


180 


gagatgagcg 


ttcggctggc 


agaagctatg 


aagcgctatg 


ggctgaatac 


aaaccategg 


240 


atcgtggtgt 


geagegagaa 


tagcttgeag 


ttcttcatgc 


ccgtgttggg 


tgccctgttc 


300 


atcggtgtgg 


ctgtggcccc 


agctaacgac 


atctacaacq 


aqegeqaget 


gctqaacacrc 


360 
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atgggcatca gccagcccac cgtcgtattc 
aacgtgcaaa agaagctacc gatcatacaa 
taccagggct tccaaagcat gtacaccttc 
gagtacgact tcgtgcccga gagcttcgac 
Sagtagtggca gtaccggatt gcccaagggc 
cgattcagtc atgcccgcga ccccatcttc 
ctcagcgtgg tgccatttca ccacggcttc 
tgcggctttc gggtcgtgct catgtaccgc 
caagactata agattcaatc tgccctgctg 

lOagcactctca tcgacaagta cgacctaagc 
ccgctcagca aggaggtagg tgaggccgtg 
cagggctacg gcctgacaga aacaaccagc 
aagcctggcg cagtaggcaa ggtggtgccc 
accggtaaga cactgggtgt gaaccagcgc 

ISatgagcggct acgttaacaa ccccgaggct 
ctgcacagcg gcgacatcgc ctactgggac 
ctgaagagcc tgatcaaata caagggctac 
ctgctgcaac accccaacat cttcgacgcc 
ggcgagctgc ccgccgcagt cgtcgtgctg 

20atcgtggact atgtggccag ccaggttaca 
ttcgtggacg aggtgcctaa aggcctgacg 
attctcatta aggccaagaa gggcggcaag 

<210> 23 
25<211> 1672 
<212> DNA 

<213> Artificial Sequence 
<220> 

30<223> A synthetic construct. 
<400> 23 

aaagccacca tggaagatgc caaaaacatt 
gaagacggga ccgccggcga gcagctgcac 

35ggcaccatcg cctttaccga cgcacatatc 
gagatgagcg ttcggctggc agaagctatg 
atcgtggtgt gcagcgagaa tagcttgcag 
atcggtgtgg ctgtggcccc agctaacgac 
atgggcatca gccagcccac cgtcgtattc 

40aacgtgcaaa agaagctacc gatcatacaa 
taccagggct tccaaagcat gtacaccttc 
gagtacgact tcgtgcccga gagcttcgac 
agtagtggca gtaccggatt gcccaagggc 



18 



gtgagcaaga 


aagggctgca 


aaagatcctc 


420 


aagatcatca 


tcatggatag 


caagaccgac 


480 


gtgacttccc 


atttgccacc 


cggcttcaac 


540 


cgggacaaaa 


ccatcgccct 


gatcatgaac 


600 


gtagccctac 


cgcaccgcac 


cgcttgtgtc 


660 


ggcaaccaga 


tcatccccga 


caccgctatc 


720 


ggcatgttca 


ccacgctggg 


ctacttgatc 


780 


ttcgaggagg 


agctattctt 


gcgcagcttg 


840 


gtgcccacac 


tatttagctt 


cttcgctaag 


900 


aacttgcacg 


agatcgccag 


cggcggggcg 


960 


gccaaacgct 


tccacctacc 


aggcatccgc 


1020 


gccattctga 


tcacccccga 


aggggacgac 


1080 


ttcttcgagg 


ctaaggtggt 


ggacttggac 


1140 


qqcqaqctqt 


qcqtccgtqq 


ccccatgatc 


1200 




L> d u<M^ciwaa 


yyauyyu i-yy 


1260 


gaggacgagc 


acttcttcat 


cgtggaccgg 


1320 


caggtagccc 


cagccgaact 


ggagagcatc 


1380 


ggggtcgccg 


gcctgcccga 


cgacgatgcc 


1440 


gaacacggta 


aaaccatgac 


cgagaaggag 


1500 


accgccaaga 


agctgcgcgg 


tggtgttgtg 


1560 


ggcaagttgg 


acgcccgcaa 


gatccgcgag 


1620 


atcgccgtgt 


aataattcta 


ga 


1672 



aagaagggcc 


cagcgccatt 


ctacccactc 


60 


aaagccatga 


agcgctacgc 


cctggtgccc 


120 


gaggtggaca 


ttacctacgc 


cgagtacttc 


180 


aagcgctatg 


ggctgaatac 


aaaccatcgg 


240 


ttcttcatgc 


ccgtgttggg 


tgccctgttc 


300 


atctacaacg 


agcgcgagct 


gctgaacagc 


360 


gtgagcaaga 


aagggctgca 


aaagatcctc 


420 


aagatcatca 


tcatggatag 


caagaccgac 


480 


gtgacttccc 


atttgccacc 


cggcttcaac 


540 


cgggacaaaa 


ccatcgccct 


gatcatgaac 


600 


gtagccctac 


cgcaccgcac 


cgcttgtgtc 


660 
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i~i f-r a f - 1~ 3 n f~ r* 
L.ydL.L.L.dyL-L. 


dL.yLL.L.yL<yd 


LLL.LdL.L.L.LL> 


ao c Pi a c f* a cr a 


V»Cl ^ ^rf \rf \-« y CI 


LaLL|jl<LaL^ 


720 


ct.cdycgt.gy 


tyCCatUtCa 




yyuctuy lllci 


ppa /"•pip*! - ptptpt 
LLdLyLi.yyy 


LlaLL Ly d L L. 


780 


+— jt /-» r-» it r-« 4- 4" 

LgcggcLtcc 


/TrtfT+" 4~ r*rri4~ 

gggt-cg cgcu 


Ld Ly LaLvyL 


l LLydyy dyy 


agctattctt 


gcgcagcttg 


840 


CddydCtata 


agattcddtc 


4" y^r ^»^*4~ nr* 4* pt 

tycccLyLty 


y Ly LLLaLaL 


tatttagctt 


ettegctaag 




jayCaCtCtCd 


4~ o/ra a a a 

ucyacaagua 


CydLL Lddy L 


ddL L uy LdLy 


agatcgecag 


eggeggggeg 


960 


ccycccagca 


aggaggcagg 


t-gaggcegtg 


i-rr^ a a a ri pyp» ♦* 
ycCddaLyLL 


tccacctacc 


aggcatccgc 




cagggctacg 


gectgacaga 


a /*» a a /"i a ap 

aacaaccdyc 


/-r/i a f/* fp P» +T i—r a 

yLLdLtL uy d 


tcacccccga 


aggggacgac 




aagcccggcy 


cag uaggcaa 


ggt-ggtgccc 


LLLLLLyayy 


ctaaggtggt 


ggacttggac 


1 1 4.0 


ac egg t aaga 


cactgggtgt 


gaaccaycyc 


ygegagct-gc 


gcgtccgtgg 


ccccatgatc 


1/ v 


lOatgagcggct 


acgttaacaa 


ccccgaggct 


acaaacgctc 


tcatcgacaa ggacggctgg 


1260 


ctgcacagcg 


gcgacatcgc 


ctactgggac 


gaggacgagc 


?irt"t*ri"t"pah 




1320 


ctgaagagee 


tgatcaaata 


caagggctac 


caggtagccc 


L-CiyL.L.yctCLL- l— 


y y dy ay lqil 


1380 


ctgctgcaac 


accccaacat 


cttcgacgcc 


ggggtcgccg 


yLL LyLLLyd 


L>ydLydL.yL>L> 


144 0 


ggcgagctgc 


ccgccgcagt 


cgtcgtgctg 


gaacacggta 


aaa ppa^ - era /■» 


L>y ayaayy ciy 


1500 


IBatcgtggact 


atgtggccag 


ccaggttaca 


accgccaaga 


»yL LyLyLyy 


uyy Ly uuy uy 


1560 


ttcgtggacg 


aggtgcctaa 


aggactgacc 


ggcaagttgg 


aLyLLLyLeta 


na t" ppnprran 
yatLLyLyay 


1620 


attctcatta 


aggecaagaa 


gggcggcaag 


atcgccgtgt 


dd L. CtCt L- L. l» ua 


yd 


1672 


<210> 24 














20<211> 1672 














<212> DNA 














<213> Artificial Sequence 










<220> 














25<223> A synthetic construct. 










<400> 24 














aaagccacca 


tggaggatgc 


taagaatatt 


a aga agggg c 


ctgctccttt 


ttatcctctg 


C A 


gaggatggga 


cagctgggga 


geagctgeat 


aayycuaLya 


agagatatgc 


tctggtgcct 




3 0gggacaattg 


cttttacaga 


tgctcatatt 


gagguggata 


ttacatatgc 


tgagtatttt 


i on 

JLOU 


gagatgtctg 


tgagactggc 


tgaggctatg 


a a /~ra /ra ♦"at" ^ 

ddy dy dLdLy 


ggctgaatac 


aaatcataga 


*? a n 


attgtggtgt 


gttctgagaa 


ttctctgcag 


t-i-t-t-t-i-at-rTr* 


ctgtgctggg 


ggctctgttt 


^ on 


attggggtgg 


ctgtggctcc 


tgctaatgat 


a L L LdLaauy 


agagagagct 


gctgaattct 


fin 


atggggattt 


ctcagcctac 


agtggtgttt 


y uy ll Laaya 


aggggctgea 


gaagattctg 


^ y 


35aatgtgcaga 


agaagctgee 


tattattcag 


a a /~ra t~ t~ .23 fp tr 3 
day attauud 


ttatggattc 


taagacagat 


4. AO 
*± o u 


tatcaggggt 


ttcagtctat 


gtatacattt 


gtgacatctc 


atctgcctcc 


tgggtttaat 


540 


gagtatgatt 


ttgtgcctga 


gtcttttgat 


agagataaga 


caattgetet 


gattatgaat 


600 


tcttctgggt 


ctacagggct 


gectaagggg 


gtggctctgc 


ctcatagaac 


agcttgtgtg 


660 


agattttctc 


atgetagaga 


tcctattttt 


gggaatcaga 


ttattcctga 


tacagctatt 


720 


40ctgtctgtgg 


tgecttttea 


tcatgggttt 


gggatgttta 


caacactggg gtatctgatt 


780 


tgtgggttta 


gagtggtgct 


gatgtataga 


tttgaggagg 


agctgtttct 


gagatctctg 


840 


caggattata 


agattcagtc 


tgctctgctg 


gtgcctacac 


tgttttcttt 


ttttgctaag 


900 


tctacactcra 


ttaataaata 


taatctatct 


aatctcrcatcf 


aoattacttc 


terqqaqqeret 


960 
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c etc tot eta 




qqaqqctqtq 


gctaagagat 


ttcatctgcc 


tgggattaga 


1020 


caaaaatata 


ggctgacaga 


gacaacatct 


gctattctga 


ttacacctga 


gggggatgat 


1080 




c fcert craocraa 


cratacrtacct 


ttttttgagg 


ctaaggtggt 


ggatctggat 

ZJ ZJ ~j zj 


1140 




cactcracraat 


gaatcagaga 


qqqqaqctqt 


gtqtqaqaqq 

ZJ ZJ ZJ ZJ ZJ ZJ 


gectatgatt 


1200 


Satgtctgggt 


atgtgaataa 


tcctgaggct 


acaaatgetc 


tgattgataa 


ggatgggtgg 


1260 


ctgcattctg 


gggatattgc 


ttattgggat 


gaggatgagc 


atttttttat 


tgtggataga 


1320 




t~ era t~ 1" a aa t a 


taaggggtat 


caaataactc 


ctgetgaget 


ggagtctatt 


1380 




atcctaatat 


ttttgatget 


acraataqctq 


qqctqcctqa 


tgatgatget 


1440 


yyyy ^-y <-y <- 


r* t* rr r* 1~ cr r* t" cr t~ 


yy u yy >-y*-»-y 


aaacafcaaaa 


agacaatgac 


agagaaggag 


1500 


lOattgtggatt 


atgtggcttc 


tcaggtgaca 


acagctaaga 


agctgagagg 


gggggtggtg 


1560 


tttgtgcjatg 


aggtgcctaa 


ggggctgaca 


gggaagctgg 


atgctagaaa 


gattagagag 


1620 


attctgatta 


aggctaagaa 


gggggggaag 




aaLaa L- i_ t_ el. 


era 
yet 


1672 


<210> 25 














15<211> 1672 














<212> DNA 














<213> Artificial Sequence 










<220> 














20<223> A synthetic construct. 










<400> 25 














aaagccacca 


tggaagatgc 


taaaaacatt 


aaqaaqqqqc 


ctgctccttt 


ctaccctctg 


60 


gaggatggga 


ctgccgggga 


geagctgeat 


aaagctatga 


aqcqqtatqc 

ZJ ZJZJ »3 


tctqqtqcca 


120 


25ggcacaattg 


cgttcacgga 


tgctcacatt 


qaqq t qqac a 


ttacataege 


tgagtatttt 


180 


gagatgfccgg 


tgcggctggc 


tgaggctatg 


aagcgatatg 


ggctgaatac 


aaaccataga 


240 


attgtagtgt 


gctctgagaa 


ctcgttgcag 


ttttttatgc 


ctqtqctqqq 

J 3 333 


ggctctcttc 


300 


atcggggtgg 


ctgtggctcc 


tgetaacgae 


atttacaatg 


agagagagct 


tttgaactcg 


360 


atggggattt 


ctcagcctac 


agtggtgttt 


qt qaq t aaqa 


aagggcttca 


aaagattctc 


420 


3 0aatgtgcaaa 


agaagctgee 


tattattcaa 


aagattatta 


ttatggactc 


taagacagac 


480 


taccaggggt 


ttcagtctat 


gtatacattt 


gtgacatctc 


atctgcctcc 


tgggttcaac 


540 


gagtatgact 


ttgtgcccga 


gtctttcgac 


agagataaga 


caattgetet 


gattatgaat 


600 


tcatctgjggt 


ctaccgggct 


gectaagggt 


qtaqctctqc 


cacatagaac 


agcttgtgtg 


660 


agattttctc 


atgctaggga 


ccctattttt 


qqqaatcaqa 


ttattcctga 


tactgetatt 


720 


35ctgtcggttg 


tgccctttca 


tcatgggttt 


qqqatqttta 


caacactggg 


ctacctgata 


780 


tgtgggttta 


gagtggtgct 


catgtatagg 


tttgaggagg 


ayCuttttLL. 


fir* /*f /"< ^ /-< /T 

yoyjc ut* uy 


Of u 


caagatfcata 


agattcagtc 


tgctctgctg 


gtgcctacac 


tgttttcttt 


ttttgctaag 


900 


tctaccotga 


tcgataagta 


tgatctgtcc 


aacctgcacg 


agattgette 


tgggggggct 


960 


cctctgfccta 


aggaggtagg 


tgaggctgtg 


getaageget 


ttcatctgcc 


tggaatcaga 


1020 


40caggggfcatg 


ggctaacaga 


aacaacatct 


gctattctga 


ttacaccaga 


gggggatgat 


1080 


aagcccgggg 


ctgtagggaa 


agtggtgccc 


ttttttgaag 


ctaaagtagt 


tgatcttgat 


1140 


aceggtaaga 


cactgggggt 


gaatcagega 


ggggaactgt 


gtgtgagagg 


gectatgatt 


1200 


atatcacrQQt 


atqtaaacaa 


ccctqaqqct 


acaaatqetc 


tqattqataa 


ggatgggtgg 


1260 
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ctgcattcgg gcgatattgc ttactgggat 

ctgaagtcgt tgatcaaata taaggggtat 

ctgcttcaac atcctaacat tttcgatgct 

ggggagctgc ctgctgctgt agtggtgctg 

Sattgtggatt atgtggcttc acaagtgaca 

tttgtggatg aggtgcctaa agggctgaca 

attctgatta aggctaagaa gggtggaaag 



21 

gaggatgagc atttcttcat cgtggacaga 132 0 
caagtagctc ctgctgagct ggagtccatt 1380 
SSSStggctq ggctgcctga tgatgatgct 144 0 
gagcacggta agacaatgac agagaaggag 1500 
acagctaaga aactgagagg tggcgttgtg 1560 
ggcaagctgg atgctagaaa aattcgagag 1620 
attgctgtgt aatagttcta ga 1672 



<210> 26 
10<211> 1672 
<212> DNA 

<213> Artificial Sequence 



<220> 

15<223> A synthetic construct. 



<400> 26 

aaagccacca tggaagatgc taaaaacatt 
gaagatggga ctgctggcga gcaacttcac 

2 0ggcacaattg cgttcacgga tgctcacatt 

gagatgtcgg tgcggctggc agaagctatg 
attgtagtgt gcagtgagaa ctcgttgcag 
atcggggtgg ctgtggctcc tgctaacgac 
atggggattt ctcagcctac agtggtgttt 
25aatgtgcaaa agaagctgcc tattattcaa 
taccaggggt ttcagtctat gtatacattt 
gagtacgact tcgtgcccga gtctttcgac 
tcatccgggt ctaccgggct gcctaagggt 
agattttctc atgctaggga ccctattttt 

3 0ctgtcggtgg tgccctfctca tcatgggttt 

tgtgggttta gagtggfcgct catgtatagg 
caagattata agattcagtc tgctctgctg 
tctacgctca tagacaagta tgacttgtcc 
cctctgtcta aggaggtagg tgaggctgtg 

35caggggtatg ggctaacaga aacaacatct 
aagcccgggg ctgtagggaa agtggtgccc 
accggtaaga cactaggggt gaaccagcgt 
atgtcggggt acgttaacaa ccccgaagct 
ctgcattcgg gcgacattgc ttactgggat 

40 ctgaagtcgt tgatcaaata caaggggtat 
ctgcttcaac atcccaacat tttcgatgct 
ggggagttgc ctgctgctgt agtggtgctt 
atcgtggatt atgtggcttc acaagtgaca 



aagaaggggc ctgctccttt ctaccctctt 60 
aaagctatga agcggtatgc tcttgtgcca 12 0 
gaggtggaca tcacatacgc tgagtatttt 180 
aagcgctatg ggctgaatac aaaccataga 240 
ttctttatgc ccgtgctggg ggctctcttc 300 
atctacaacg agcgagagct gttgaactcg 360 
gtgagtaaga aagggcttca aaagattctc 42 0 
aagattatta ttatggactc taagaccgac 480 
gtgacatctc atctgcctcc tggcttcaac 540 
agagataaga caattgctct gatcatgaat 600 
gtagctctgc cccatagaac agcttgtgtg 660 
gggaatcaga ttattcctga cactgctatt 720 
gggatgttta caacactggg ctacctaata 780 
tttgaagaag agctgttctt acgctctttg 840 
gtgccaacac tattctcttt ttttgctaag 90 0 
aacttgcacg agattgcttc tggcggagca 960 
gctaagcgct ttcatctgcc tggtatcaga 1020 
gctattctga ttacaccaga gggggatgat 1080 
ttttttgaag ccaaagtagt tgatcttgat 1140 
ggtgaactgt gtgtgagagg gcctatgatt 1200 
acaaatgctc tgattgataa ggatggctgg 1260 
gaggatgagc atttcttcat cgtggacaga 1320 
caagtagctc ctgctgagct ggaatccatt 13 80 
ggggtggctg ggctgcctga tgatgatgct 1440 
gagcacggta agacaatgac agagaaggag 1500 
acagctaaga aactgagagg tggcgttgtg 1560 
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tttgtggatg aggtgcctaa agggctcact 
attctgatta aggctaagaa gggtggaaag 

<210> 27 
5<211> 1672 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> A synthetic construct. 
<400> 27 

aaagccacca tggaagatgc taaaaacatt 
gaagatggga ctgctggcga gcaacttcac 

15ggcacaattg cgttcacgga tgctcacatt 
gagatgtcgg tgcggctggc agaagctatg 
attgtagtgt gcagtgagaa ctcgttgcag 
atcggggtgg ctgtggctcc tgctaacgac 
atggggatct ctcagcctac agtggtgfctt 

20aatgtgcaaa agaagctgcc tattattcaa 
taccaggggt ttcagtccat gtatacattt 
gagtacgact tcgtgcccga gtctttccjac 
tcatccgggt ctaccgggct gcctaagcjgt 
agattctctc atgccaggga cccgatcttt 

25ctgtcggtgg tgccctttca tcatgggttt 
tgtgggttta gagtggtgct catgtatagg 
caagattata agattcagtc tgctctgctg 
tctacgctca tagacaagta tgacttgtcc 
cctctgtcta aggaggtagg tgaggctgtg 

30caggggtacg ggctaacaga aacaacttct 
aagcccgggg ctgtagggaa agtggtgccc 
accggtaaga cactaggggt gaaccagcgt 
atgtcggggt acgttaacaa ccccgaagct 
ttgcattcgg gcgacattgc ctactgggat 

35ctgaagtcgt tgatcaaata caaggggtat 
ctgcttcaac atccaaacat tttcgatgct 
ggagagttgc ctgctgctgt agtagtgott 
atcgtggatt atgtggcttc acaagtgaca 
tttgtggatg aggtgcctaa agggctcact 

40attctcatta aggctaagaa gggtggaaag 
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ggcaagctgg 


atgctagaaa 


aattcgagag 


1620 


attgctgtgt 


aatagttcta 


ga 


1672 


ddyddyyyy^- 


LLytL^UULL 




60 


^ 3 a fir* 4" a f" (*f 25 

ddcty ^ ucL u y ct 


ay uyy tauy l 


l. l. y uy Li^ci 


120 


gaggcggaca 


LudLn LaLyu 


uy ay lollll 




aagcgctaty 


y y I— uydaUaL 


aaaULaLaya 


240 


i_ +_ _ *_ *_ i- _ i_ _ 


ccy cyoi.yyy 


/-»/-»/-» f- 4- 4- 4- ^ 

yyuu^L-^tLi^ 




^ 4" /™* 4™ ^ ^ ^ ^ r*rr 


ayuyayayL l. 


y u uy adiv l> 




y tyay uaaya 


s 3 (rant* t~ +t /~> a 


dday d L» L.U 


420 


ddyaULdtLd 


4~ 4~ o 4~ rvrr a /*■* 4~ 
C LdLgydLLL 


4~ o s n s a ft a /■» 
LddydLdy dL 


Ann 
*± o u 


yLyaCaLCLC 


-a 4- /— < 4~ trr* r+ \~ r*r* 
dLULyLLLLL 


t* cine* Uraap 

y y i^aau 




dy dy dl-ddy a. 


a a 4~ 4~ Af< 4— 4— 
LdaL LyLLL l_ 


ydLUd Lyadt 


oyu 


rf™* 4" a ***»^4~ /-« 4~ /^r 

ytayctctgc 


LCLaLtydaL 


ayctLytyty 




ygyd.aX.Cdyd 


+"4-a 4* 4- /*■ r"* ft a 
LCdLLOUCyd 


LaLUy^Ld U U 


79fi 


r*m*~t a 4~ *"r4* 4~ 4" a 

gggatgLLLa 


CaaCaCtggg 


aha/ 1 ^ f"aa("a 
d LdLL Lddld 


7QA 
/ Ow 


u u tyactyady 


aaL L. y u L. l_» L. L. 


dv^y llll L.y 




y UyuuadLdL 


tdLLLLLLLL 


lll i^y Lddy 




aaut uyudLy 


ayaL LyLtL^> 




960 


y uaciy 




i«yy L>a i_>\_iicii^a 


1020 


err* t" a t* t* p t aa 




acjerrrra tcra c 
mmm ^y ^m**^ 


1080 


ftttttaaaa 


rrasaoh ant" 


f- rj a f- r> t* t - cr a t 

L.y utv^t uya 


1140 


rjot" era art" cr t~ 

yy uy nav ™*y 


y t-y L.y <-yyyy 


nppf a t" oa t" t 


1200 


acaaatgetc 


ntactgataa 


ggatggctgg 


1ZC>U 


gaggatgagc 


atttcttcat 


cgtggacaga 


1320 


caagtagctc 


ctgetgaget 


ggaatccatt 


1380 


ggggtggctg 


ggctgcctga 


tgatgatget 


1440 


gagcaeggta 


agacaatgac 


agagaaggag 


1500 


acagctaaga 


aactgagagg 


tggcgttgtg 


1560 


ggcaagctgg 


atgccagaaa 


aattcgagag 


1620 


attgctgtgt 


aatagttcta 


ga 


1672 
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<210> 28 
<211> 1672 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 28 



lOaaagccacca 


tggaagatgc 


taaaaacatt 


aagaaggggc 


ctgctccctt 


ctaccctctt 


bu 


gaagatggga 


ctgctggcga 


gcaacttcac 


aaagctatga 


agcggtatgc 


tcttgtgcca 


1 O Pi 


ggcacaattg 


cgttcacgga 


tgctcacatt 


gaggtggaca 


tcacatacgc 


tgagtantcc 


ion 


gagatgtcgg 


tgcggctggc 


agaagctatg 


aagcgctatg 


ggctgaatac 


aaaccataga 


o a n 


attgtagtgt 


gcagtgagaa 


ctcgttgcag 


ttctttatgc 


ccgtgctggg 


ggctctcttc 


o n f\ 
300 


ISatcggggtgg 


ctgtggctcc 


tgctaacgac 


atctacaacg 


agcgagagct 


gttgaactcg 


•5 C f\ 

360 


atggggatct 


ctcagcctac 


agtggtgttt 


gtgagtaaga 


a.agggcttca 


aaagattctc 


42U 


aatgtgcaaa 


agaagctgcc 


tattatacaa 


aagattatta 


xtatggactc 


taagaccgac 


4 oU 


taccaggggt 


ttcagtccat 


gtacacattt 


gtaacctctc 


axctgcctcc 


tggcttcaac 


C A f\ 


gagtacgact 


tcgtgcccga 


gtctttcgac 


agggacaaaa 


cgattgctct 


gatcatgaac 


6 QO 


20tcatccgggt 


ctaccgggct 


gcctaagggt 


gtagctctgc 


cccatcgaac 


agcttgtgtg 


odU 


agattctctc 


atgccaggga 


cccgatcttt 


gggaatcaga 


ttattcctga 


cactgctatt 


720 


ctgtcggtgg 


tgccctttca 


tcatgggttt 


gggatgttca 


caacactggg 


atacctcatt 


780 


tgcgggttta 


gagtggtgct 


catgtatagg 


tttgaagaag 


aactattcct 


acgctctttg 


840 


caagattata 


agattcagtc 


tgctctgctg 


gtgccaacac 


tattctcttt 


ttttgctaag 


900 


25tctacgctca 


tagacaagta 


tgacttgtcc 


aacttgcacg 


agattgcttc 


tggcggagca 


960 


cctctgtcta 


aggaggtagg 


tgaggctgtg 


gctaagcgct 


ttcatctgcc 


tggtatcaga 


1020 


caggggtacg 


ggctaacaga 


aacaacttct 


gctattctga 


ttacaccaga 


gggcgatgac 


1080 


aaacccgggg 


ctgtagggaa 


agtggtgccc 


ttttttgaag 


ccaaagtagt 


tgatcttgat 


1140 


accggtaaga 


cactaggggt 


gaaccagcgt 


ggtgaactgt 


gtgtgcgggg 


ccctatgatt 


1200 


30atgtcggggt 


acgttaacaa 


ccccgaagct 


acaaatgctc 


ttattgataa 


ggatggctgg 


1260 


ttgcattcgg 


gcgacattgc- 


ctactgggat 


gaggatgagc 


atttcttcat 


cgtggacaga 


1320 


ctgaagtcgt 


tgatcaaata 


caaggggtat 


caagtagctc 


ctgctgagct 


ggaatccatt 


1380 


ctgcttcaac 


atcctaacat 


tttcgatgct 


ggggtggctg 


ggctgcctga 


tgatgatgct 


1440 


ggagagttgc 


ctgctgctgt 


agtagtgctt 


gagcacggta 


agacaatgac 


agagaaggag 


1500 


35atcgtggatt 


atgtggcttc 


acaagtgaca 


acagctaaga 


aactgagagg 


tggcgttgtg 


1560 


tttgtggatg 


aggtgcctaa 


agggctcact 


ggcaagctgg 


atgccagaaa 


aattcgagag 


1620 


attctcatta 


aggctaagaa 


gggtggaaag 


attgctgtgt 


aatagttcta 


ga 


1672 



<210> 29 
40<211> 1672 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> A synthetic construct. 
<400> 29 

Saaagccacca tggaagatgc caaaaacatt aagaaggggc ctgctccctt cfcaccctctt 60 
gaagatggga ctgctggcga gcaacttcac aaagctatga agcggtatgc tcttgtgcca 120 
ggcacaattg cgttcacgga tgctcacatt gaagtagaca tcacatacgc tgagtatttt 180 
gagatgtcgg tgcggctggc agaagctatg aagcgctatg ggctgaatac aaaccataga 240 
attgtagtgt gcagtgagaa ctcgttgcag ttctttatgc ccgtgctggg ggctctcttc 300 

lOatcggggtgg ctgtggctcc tgctaacgac atctacaacg agcgagagct gfctgaactcg 3 60 
atggggatct ctcagcctac agtggtgttt gtgagtaaga aagggcttca aaagattctc 420 
aatgtgcaaa agaagctgcc tattatacaa aagattatta ttatggactc taagaccgac 480 
taccaggggt ttcagtccat gtacacattt gtaacctctc atctgcctcc tggcttcaac 54 0 
gagtacgact tcgtgcccga gtctttcgac agggacaaaa cgattgctct gaitcatgaac 600 

ISagctccgggt ctaccgggct gcctaagggt gtagctctgc cccatcgaac agcttgtgtg 660 
agattctctc atgccaggga cccgatcttt ggaaaccaga tcatccctga cactgctatt 72 0 
ctgtcggtgg tgccctttca tcatgggttt gggatgttca caacactggg atacctcatt 7 80 
tgcgggttta gagtggtgct catgtatagg tttgaagaag aactattcct acgctctttg 840 
caagattata agattcagtc tgctctgctg gtgccaacac tattctcttt ttttgctaag 900 

20tctacgctca tagacaagta tgacttgtcc aacttgcacg agattgcttc tggcggagca 960 
cctctgtcta aggaggtagg tgaggctgtg gctaagcgct ttcatctgcc tggtatcaga 102 0 
caggggtacg ggctaacaga aacaacttct gctattctga ttacaccaga gggcgatgac 1080 
aaacccgggg ctgtagggaa agtggtgccc ttttttgaag ccaaagtagt tgatcttgat 1140 
accggtaaga cactaggggt gaaccagcgt ggtgaactgt gtgtgcgggg ccctatgatt 1200 

25atgtcggggt acgttaacaa ccccgaagct acaaatgctc tcatagacaa ggacgggtgg 1260 
cttcatagcg gcgacattgc ctactgggac gaggatgagc atttcttcat ccjtggacaga 132 0 
ctgaagtcgt tgatcaaata caaggggtat caagtagctc ctgctgagct ggaatccatt 13 80 
ctgcttcaac accccaatat cttcgatgct ggggtggctg ggctgcctga tgatgatgct 1440 
ggagagctgc ctgctgctgt agtagtgctt gagcacggta agacaatgac agagaaggag 1500 

30atcgtggatt atgtggcttc acaagtgaca acagctaaga aactgagagg tggcgttgtg 1560 
tttgtggatg aggtgcctaa agggctcact ggcaagctgg atgccagaaa aattcgagag 1620 
attctcatta aggctaagaa gggtggaaag attgctgtgt aatagttcta ga 1672 

<210> 30 
35<211> 1056 
<212> DNA 

<213> Artificial Sequence 
<220> 

40<223> A synthetic construct. 
<400> 30 

ccactcagtg gccaccatga agaagcccga gctgaccgct accagcgttg acjaagttcct 60 
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gatcgagaag ttcgacagcg tgagcgacct 
cgccttcagc ttcgatgtcg gcggacgcgg 
tggcttctac aaagaccgct acgtgtaccg 
cgaggtgctg gacatcggcg agttcagcga 
Stcaaggcgtg actctccaag acctgcccga 
cgccgaggct atggacgcta ttgccgccgc 
attcgggccc caaggcatcg gccagtacac 
tgatccccat gtctaccact ggcagaccgt 
tcaagccctg gacgagctga tgctgtgggc 

lOccatgccgac ttcggcagca acaacgtcct 
cgactggagc gaggccatgt tcggggacag 
gcggccctgg ctggcctgca tggagcagca 
gctggccggc agcccccgtc tgcgagccta 
ccagagcctc gtggacggca acttcgacga 

15catcgtccgc agcggggccg gcaccgtcgg 
cgtatggacc gacggctgcg tcgaggtgct 
acgaccgcgc gctaaggagt agtaaccagc 

<210> 31 
20<211> 1672 
<212> DNA 

<213> Artificial Sequence 



25 

gatgcagtta agcgagggcg aggaaagccg 120 
ctatgtactg cgggtgaata gctgcgctga 180 
ccacttcgcc agcgctgcac tgcccatccc 240 
gagcctgaca tactgcatca gccgccgcgc 300 
gacagagctg cccgctgtgc tacagcctgt 360 
cgacctgagc cagaccagcg gcttcggccc 420 
cacctggcgc gacttcatct gcgccattgc 480 
gatggacgac accgtgagcg ccagcgtagc 540 
cgaggactgc cccgaggtgc gccatctcgt 600 
gaccgacaac ggccgcatca ccgccgtaat 660 
tcagtacgag gtggccaaca tcttcttctg 720 
aacccgctac ttcgagcgcc gccatcccga 780 
catgctgcgc atcggcctgg atcagctcta 840 
tgctgcctgg gctcaaggcc gctgcgatgc 900 
tcgcacacaa atcgctcgcc ggagcgccgc 960 
ggccgacagc ggcaaccgcc ggcccagtac 1020 
tcttgg 1056 



25<220> 

<223> A synthetic construct. 
<400> 31 

aaagccacca tggaagatgc caaaaacatt 
30gaagatggga ctgctggcga gcaacttcac 
gggacaattg cgttcacgga tgctcacatt 
gagatgtcgg tgcggctggc agaagctatg 
attgtagtgt gcagtgagaa ctcgttgcag 
atcggggtgg ctgtggctcc tgctaacgac 
35atggggatct ctcagcctac agtggtgttt 
aatgtgcaaa agaagctacc gatcatacaa 
taccaggggt ttcagtccat gtacacattt 
gagtacgact tcgtgcccga gtctttcgac 
agctccgggt ctaccgggct gcctaagggt 
4 0agattctctc atgccaggga cccgatcttt 
ctgtcggtgg tgccctttca tcatgggttt 
tgcgggttta gagtggtgct catgtatagg 
caagattata agattcagtc tgctctgctg 



aagaaggggc ctgctccctt ctaccctctt 60 
aaagctatga agcggtatgc tcttgtgcca 120 
gaagtagaca tcacatacgc tgagtatttt 180 
aagcgctatg ggctgaatac aaaccataga 240 
ttctttatgc ccgtgctggg ggctctcttc 300 
atctacaacg agcgagagct gttgaactcg 360 
gtgagtaaga aagggcttca aaagattctc 420 
aagatcatca tcatggatag caagaccgac 480 
gtaacctctc atctgcctcc tggcttcaac 540 
agggacaaaa cgattgctct gatcatgaac 600 
gtagctctgc cccatcgaac agcttgtgtg 660 
ggaaaccaga tcatccctga cactgctatt 720 
gggatgttca caacactggg atacctcatt 780 
tttgaagaag aactattcct acgctctttg 840 
gtgccaacac tattctcttt ttttgctaag 900 
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trtacactca 


tagacaagta 


tgacttgtcc 


aacttgeacg 


agattgette 


tQQCQQacrc a 


960 






t Q a Q q c t Q t Q 


getaageget 


ttcatctgcc 


tggtateaga 


1020 


n a aaaa t~ a r* a 

uei yy y y *-«.<-y 


aactaacaaa 


aacaacttct 


gctattctga 


ttacaccaga 


qqqcaatcfac 

333 v '3 , *''3 wu 


1080 


a. ctcii_. v_ i~y y y y 


r* t* q t" a aaa a a 


agtggtgccc 


ttttttgaag 


ccaaagtagt 


tgatcttgat 


1140 


^arnnnt' A ana 
d^. i-«y y u cidy a 


c* a r* t a crcfcrcf t 
i«cii*> i-»yyyy *- 


aaaccaacat 


aataaactat 


atatacaoQcr 

y v^y 33 


ccctatgatt 


1200 


atgtcggggt 


acgttaacaa 


ccccgaagct 


acaaatgetc 


tcatagacaa 


ggacgggtgg 


1260 


cttcatagcg 


gcgacattgc 


ctactgggac 


gaggatgagc 


atttcttcat 


cgtggacaga 


1320 


vj uy dciy L.<_y u 


f- era hfaaai-a 
uycLL.t-'CicLct L-Cl 


fiaa CICIClClt~ a t* 
i> ciay y y y k> u l 




ctaccaaact 


taaatccatt 

lp»^«| CL^H L— V* w CI L— 


1380 


j-1 4- 4~ /I 3 O /I 


Ci\*\*\~\*a.Cl L.CL L> 




acta ci i~ acic t~ a 

yyyy *-yy<-*-y 


ClClCf'ClCC^'CI^i 

*»y *■* ^-y c*. 


t ci a t" ci a 1~ cr r* t* 

ci ^y ci v id- 


1440 


"1 rt jTf-T ^ y-w /-» j-t ^ /-» /t 

lugg ay a.y cty c 




arrt" ant - npt" i~ 
ciy u ciy uyv* i- i— 


y ciy ct >— y y i— o 


a era r*a a t~crar* 
ciy cii^cici L.y 


aaacra acrcr acr 
c*y ciy ciay y ciy 


1500 


atcgtggatt 


atgtggcttc 


acaagtgaca 


acagctaaga 


aactccgagg 


tggcgttgtg 


1560 


tttgtggatg 


aggtgcctaa 


agggctcact 


ggcaagctgg 


atgccagaaa 


aattcgagag 


1620 


attctcatta 


aggctaagaa 


gggtggaaag 


ct l. cy \* L.y *-y u 


a a t~ a rr"t~ i- r« i~ a 
cLctucty i— L- i— d 


yd 


1672 


15<210> 32 














<211> 1672 














<212> DNA 














<213> Artificial Sequence 










20<220> 














<223> A synthetic construct. 










<400> 32 














aaagccacca 


tggaagatgc 


caaaaacatt 


a a era a cirrcrcrr 1 

CJ.caycxcxyyyy^ 


ctacfcccctfc 


ctaccctctt 


60 


25gaagatggga 


ctgctggcga 


gcaacttcac 


a aaart'fl t"aa 
cm ciy w i-- a. i_y a. 


aacaatafeac 
ciy y y c*. *»• y w 


tcttcrtacca 


120 


gggacaattg 


cgttcacgga 


tgctcacatt 


y ci civj ciy q ci 


t cacahacac 


tgagtatttt 


180 


gagatgtcgg 


tgcggctggc 


agaagctatg 


aacrccfctatcr 


ggctgaatac 


aaaccataga 


240 


attgtagtgt 


gcagtgagaa 


ctcgttgcag 


ttctttatac 


cccf tact aaa 


ggctctcttc 


300 


attggggtgg 


ctgtggctcc 


tgetaatgae 


atctacaaca 

Gl w d^>* t*ld w^»J 


aacaaaaGC t 

3 3 3 3 


gttgaacagt 


360 


30atggggatct 


ctcagcctac 


agtggtgttt 


erfceraert aaaa 

y ^y ciy ciciy ci 


aaaaacttca 
y y y vd 


aaaaattctc 


420 


aatgtgcaaa 


agaagctacc 


gatcatacaa 


aagatcatca 


tcatggatag 


caagaccgac 


480 


taccaggggt 


ttcagtccat 


gtacacattt 


gtaacctctc 


atctgcctcc 


tggcttcaat 


540 


gagtatgact 


tcgtgcccga 


gtctttcgac 


agggac a a a a 


egattgetet 


gatcatgaac 


600 


agcagtgggt 


ctaccgggct 


gectaagggt 


qtaactctoc 


cccatcgaac 


aacttcrtcrtcr 


660 


35agattctctc 


atgccaggga 


cccgatcttt 


ggaaaccaga 


tcatccetga 


cactgetatt 


720 


ctgtcggtgg 


tgccctttca 


tcatgggttt 


ffrrfia t~ fit - +■ /•'a 


naarapf nan 
ctctv^cti^ uy y y 






tgcgggttta 


gagtggtgct 


catgtatagg 


tttgaagaag 


aactattcct 


aegctctttg 


840 


caagattata 


agattcagtc 


tgctctgctg 


gtgccaacac 


tattctcttt 


ttttgctaag 


900 


tctacgctca 


tagacaagta 


tgacttgtcc 


aacttgeacg 


agattgette 


tggeggagea 


960 


40cctctgtcta 


aggaggtagg 


tgaggctgtg 


getaageget 


ttcatctgcc 


tggtateaga 


1020 


caggggtacg 


ggctaacaga 


aacaacttct 


gctattctga 


ttacaccaga 


gggegatgae 


1080 


aaacctgggg 


ctgtagggaa 


agtggtgccc 


ttttttgaag 


ccaaagtagt 


tgatcttgat 


1140 


accggtaaga 


cactaggggt 


gaaccagaga 


ggtgaattgt 


gtgtgagggg 


ccctatgatt 


1200 
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atgtcggggt acgttaacaa ccccgaagct acaaatgctc tcatagacaa ggacgggtgg 1260 

cttcatagtg gagatattgc ctactgggat gaagatgagc atttcttcat cgtggacaga 132 0 

ctgaagtcgt tgatcaaata caaggggtat caagtagctc ctgccgagct tgagtccatt 13 80 

ctgcttcaac accccaatat cttcgatgct ggggtggctg ggctgcctga tgatgatgct 144 0 

Sggagagctgc ctgctgctgt agtagtgctt gagcatggta agacaatgac agagaaggag 1500 

atcgtggatt atgtggcttc acaagtgaca acagctaaga aactccgagg tggcgttgtg 1560 

tttgtggatg aggtgcctaa agggctcact ggcaagctgg atgccagaaa aattcgagag 162 0 

attctcatta aggctaagaa gggtggaaag attgctgtgt aatagttcta ga 1672 

10<210> 33 

<211> 1672 

<212> DNA 

<213> Artificial Sequence 
15<220> 

<223> A synthetic construct. 



<400> 33 



aaagccacca 


tggaagatgc 


caaaaacatt 


aagaaggggc 


ctgctccctt 


ctaccctctt 


60 


20gaagatggga 


ctgctggcga 


gcaacttcac 


aaagctatga 


agcggtatgc 


tcttgtgcca 


120 


gggacaattg 


cgttcacgga 


tgctcacatt 


gaagtagaca 


tcacatacgc 


tgagtatttt 


180 


gagatgtcgg 


tgcggctggc 


agaagctatg aagcgctatg 


ggctgaatac 


aaaccataga 


240 


attgtagtgt 


gcagtgagaa 


ctcgttgcag 


ttctttatgc 


ccgtgctggg ggctctcttc 


300 


attggggtgg 


ctgtggctcc 


tgctaatgac 


atctacaacg 


agcgagagct gttgaacagt 


360 


25atggggatct 


ctcagcctac 


agtggtgttt 


gtgagtaaga 


aagggcttca 


aaagattctc 


420 


aatgtgcaaa 


agaagctacc 


gatcatacaa 


aagatcatca 


tcatggatag 


caagaccgac 


480 


taccaggggt 


ttcagtccat 


gtacacattt 


gtaacctctc 


atctgcctcc 


tggcttcaat 


540 


gagtatgact 


tcgtgcccga 


gtctttcgac 


agggacaaaa 


cgattgctct 


gatcatgaac 


600 


agcagtgggt 


ctaccgggct 


gcctaagggt 


gtagctctgc 


cccatcgaac 


agcttgtgtg 


660 


3 0agattctctc 


atgccaggga 


cccgatcttt 


ggaaaccaga 


tcatccctga 


cactgctatt 


720 


ctgtcggtgg 


tgccctttca 


tcatgggttt gggatgttca 


caacactggg 


atacctcatt 


780 


tgcgggttta 


gagtggtgct 


: catgtatagg 


tttgaagaag 


aactattcct 


acgctctttg 


840 


caagattata 


agattcagtc 


tgctctgctg gtgccaacac 


tattctcttt 


ttttgctaag 


900 


tctacgctca 


tagacaagta 


tgacttgtcc 


aacttgcacg 


agattgcttc 


tggcggagca 


960 


35cctctgtcta 


aggaggtagg 


tgaggctgtg gctaagcgct 


ttcatctgcc 


tggtatcaga 


1020 


caggggtacg 


ggctaacaga 


aacaacttct 


gctattctga 


ttacaccaga gggcgatgac 


1080 


aaacctgggg 


ctgtagggaa 


agtggtgccc 


ttttttgaag 


ccaaagtagt 


tgatcttgat 


1140 


accggtaaga 


cactaggggt 


gaaccagaga ggtgaattgt 


gtgtgagggg 


ccctatgatt 


1200 


atgtcggggt 


acgttaacaa 


ccccgaagct 


acaaatgctc 


tcatagacaa 


ggacgggtgg 


1260 


40cttcatagtg 


gagatattgc 


ctactgggat 


gaagatgagc 


atttcttcat 


cgtggacaga 


1320 


ctgaagtcgt 


tgatcaaata 


caaggggtat 


caagtagctc 


ctgccgagct 


tgagtccatt 


1380 


ctgcttcaac 


accccaatat 


cttcgatgct 


ggggtggctg 


ggctgcctga 


tgatgatgct 


1440 


ggagagctgc 


ctgctgctgt 


agtagtgctt 


gagcatggta 


agacaatgac 


agagaaggag 


1500 
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atcgtggatt atgtggcttc acaagtgaca acagctaaga aactccgagg tggcgttgtg 1560 

tttgtggatg aggtgcctaa aggactcact ggcaagctgg atgccagaaa aattcgagag 1620 

attctcatta aggctaagaa gggtggaaag attgctgtgt aatagttcta ga 1672 

5<210> 34 
<211> 10 
<212> DNA 

<213> Artificial Sequence 
10<220> 

<223> A synthetic construct. 
<400> 34 

gccaccatga 10 

15 

<210> 35 
<211> 11 
<212> DNA 

<213> Artificial Sequence 

20 

<220> 

<223> A synthetic construct. 
<220> 

25<221> misc_f eature 
<222> 4, 5, 6, 7, 8 
<223> n = A,T,C or G 

<400> 35 

30ccannnnntg g 11 

<210> 36 
<211> 25 
<212> DNA 
35<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

40<220> 

<221> misc_feature 

<222> 1, 2, 3, 4, 5, 9, 10, 11, 12, 13 
<223> n = A,T,C or G 
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<400> 36 

nnnnnccann nnntggccac catgg 25 

<210> 37 
5<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> A synthetic construct . 
<220> 

<221> misc_feature 

<222> 10, 11, 12, 13, 14, 18, 19, 20 
15<223> n = A,T,C or G 

<400> 37 

taataaccan nnnntggnnn 2 0 

20<210> 38 
<211> 825 
<212> DNA 

<213> Artificial Sequence 

25<220> 

<223> A synthetic construct. 

<400> 38 

ccactcagtg gccaccatga tcgagcagga cggcctccat gctggcagtc ccgcagcctg 60 

3 0ggtcgagcgc ttgttcgggt acgactgggc ccagcagacc atcggatgta gcgatgccgc 120 

agtgttccgc ctgagcgctc aaggccggcc cgtgctgttc gtgaagaccg acctgagcgg 180 

cgccctgaac gagcttcaag acgaggctgc ccgcctgagc tggctggcca ccaccggtgt 240 

accctgcgcc gctgtgttgg atgttgtgac cgaagccggc cgcgactggc tgctgctggg 300 

cgaggtgcct ggccaggacc tgctgagcag ccacctggcc cccgctgaga aggtgagcat 360 

35catggccgac gccatgcggc gcctgcacac cctggacccc gctacatgcc ccttcgacca 420 

ccaggctaag caccgcatcg agcgggctcg gacccgcatg gaggccggcc tggtggacca 480 

ggacgacctg gacgaggagc accagggcct ggcccccgct gaactgttcg cccgcctgaa 540 

agcccgcatg ccggacggtg aggacctggt tgtgacacac ggcgacgcct gcctccctaa 600 

catcatggtc gagaacgggc gcttctccgg cttcatcgac tgcggccgcc tgggcgttgc 660 

40cgaccgctac caggacatcg ccctggccac ccgcgacatc gccgaggagc tgggcggcga 720 

gtgggccgac cgcttcctgg tcttgtacgg catcgcagct cccgacagcc agcgcatcgc 780 

cttctaccgc ctgctggacg agttcttcta gtaaccaggc tctgg 825 
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<210> 39 
<211> 825 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 39 



lOccactccgtg 


gccaccatga 


tcgaacaaga 


cggcctccat 


gctggcagtc 


ccgcagcttg 


60 


ggtcgaacgc 


ttgttcgggt 


acgactgggc 


ccagcagacc 


atcggatgta gcgatgcggc 


120 


cgtgttccgt 


ctaagcgctc 


aaggccggcc 


cgtgctgttc 


gtgaagaccg 


acctgagcgg 


180 


cgccctgaac 


gagcttcaag 


acgaggctgc 


ccgcctgagc 


tggctggcca 


ccaccggtgt 


240 


accctgcgcc 


gctgtgttgg 


atgttgtgac 


cgaagccggc 


cgggactggc 


tgctgctggg 


300 


IScgaggtccct 


ggccaggatc 


tgctgagcag 


ccaccttgcc 


cccgctgaga 


aggtttccat 


360 


catggccgat 


gcaatgcggc 


gcctgcacac 


cctggacccc 


gctacatgcc 


ccttcgacca 


420 


ccaggctaag 


catcggatcg 


agcgtgctcg 


gacccgcatg 


gaggccggcc 


tggtggacca 


480 


ggacgacctg 


gacgaggagc 


atcagggcct 


ggcccccgct 


gaactgttcg 


cccgcctgaa 


540 


agcccgcatg 


ccggacggtg 


aggacctggt 


tgtgacacat 


ggagatgcct 


gcctccctaa 


600 


2 0catcatggtc 


gagaatggcc 


gcttctccgg 


cttcatcgac 


tgcggtcgcc 


taggagttgc 


660 


cgaccgctac 


caggacatcg 


ccctggccac 


ccgcgacatc 


gctgaggagc 


ttggcggcga 


720 


gtgggccgac 


cgcttcttag 


tcttgtacgg 


catcgcagct 


cccgacagcc 


agcgcatcgc 


780 


cttctaccgc 


ctgctcgacg 


agttctttta 


atgaccaggc 


tctgg 




825 



25<210> 40 



<400> 40 
000 



30<210> 41 
<211> 861 
<212> DNA 

<213> Escherichia coli 



35<400> 41 

atgagtattc aacatttccg tgtcgccctt 
gtttttgctc acccagaaac gctggtgaaa 
cgagtgggtt acatcgaact ggatctcaac 
gaagaacgtt ttccaatgat gagcactttt 

40cgtattgacg ccgggcaaga gcaactcggt 
gttgagtact caccagtcac agaaaagcat 
tgcagtgctg ccataaccat gagtgataac 
ggaggaccga aggagctaac cgcttttttg 



attccctttt ttgcggcatt ttgccttcct 60 
gtaaaagatg ctgaagatca gttgggtgca 120 
agcggtaaga tccttgagag ttttcgcccc 180 
aaagttctgc tatgtggcgc ggtattatcc 240 
cgccgcatac actattctca gaatgacttg 300 
cttacggatg gcatgacagt aagagaatta 360 
actgcggcca acttacttct gacaacgatc 420 
cacaacatgg gggatcatgt aactcgcctt 480 
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gatcgttggg aaccggagct gaatgaagcc 
cctgtagcaa tggcaacaac gttgcgcaaa 
tcccggcaac aattaataga ctggatggag 
tcggcccttc cggctggctg gtttattgct 
Scgcggtatca ttgcagcact ggggccagat 
acgacgggga gtcaggcaac tatggatgaa 
tcactgatta agcattggta a 



31 

ataccaaacg acgagcgtga caccacgatg 540 
ctattaactg gcgaactact tactctagct 600 
gcggataaag ttgcaggacc acttctgcgc 660 
gataaatctg gagccggtga gcgtgggtct 720 
ggtaagccct cccgtatcgt agttatctac 780 
cgaaatagac agatcgctga gataggtgcc 840 

861 



<210> 42 

10<211> 1056 

<212> 1DNA 

<213> Artificial Sequence 



<220> 

15<223> A synthetic construct. 



<400> 42 

ccactccgtg gccaccatga 
catcgagaag ttcgacagtg 

2 0agccttcagc ttcgatgtcg 

tggcttctac aaagaccgct 
cgaagtgttg gacatcggcg 
ccaaggcgtt actctccaag 
cgccgaagct atggatgcta 
25attcgggccc caaggcatcg 
tgatccccat gtctaccact 
tcaagccctg gacgaactga 
ccatgccgac ttcggcagca 
cgactggagc gaggctatgt 

3 0gcggccctgg ctggcttgca 

gctggccggc agccctcgtc 
ccagagcctc gtggacggca 
catcgtccgc agcggggccg 
cgtatggacc gacggctgcg 
35acgacccjcgc gctaaggagt 



agaagcccga gctgaccgct 
tgagcgacct gatgcagttg 
gcggacgcgg ctatgtactg 
acgtgtaccg ccacttcgcc 
agttcagcga gagcctgaca 
acctccccga aacagagctg 
ttgccgccgc cgacctcagt 
gccagtacac aacctggcgg 
ggcagaccgt gatggacgac 
tgctgtgggc cgaagactgt 
acaacgtcct gaccgacaac 
tcggggacag tcagtacgag 
tggagcagca gactcgctac 
tgcgagccta catgctgcgc 
acttcgacga tgctgcctgg 
gcaccgtcgg tcgcacacaa 
tcgaggtgct ggccgacagc 
agtaaccagc tcttgg 



accagcgttg aaaaatttct 60 
tcggagggcg aagagagccg 120 
cgggtgaata gctgcgctga 180 
agcgctgcac tacccatccc 240 
tactgcatca gtagacgcgc 300 
cctgctgtgt tacagcctgt 360 
caaaccagcg gcttcggccc 42 0 
gatttcattt gcgccattgc 480 
accgtgtccg ccagcgtagc 54 0 
cccgaggtgc gccacctcgt 600 
ggccgcatca ccgccgtaat 660 
gtggccaaca tcttcttctg 720 
ttcgagcgcc ggcatcccga 780 
atcggcctgg atcagctcta 840 
gctcaaggcc gctgcgatgc 900 
atcgctcgcc ggagcgccgc 960 
ggcaaccgcc ggcccagtac 1020 

1056 



<210> 43 

<211> 1653 

<212> DNA 

40<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
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<400> 43 

atggaagacg ccaaaaacat aaagaaaggc 

accgctggag agcaactgca taaggctatg 

gcttttacag atgcacatat cgaggtggac 

Sgttcggttgg cagaagctat gaaacgatat 

tgcagtgaaa actctcttca. attctttatg 

gcagttgcgc ccgcgaacga catttataat 

tcgcagccta ccgtggtgtt cgtttccaaa 

aaaaagctcc caatcatcca aaaaattatt 

lOtttcagtcga tgtacacgtfc cgtcacatct 

tttgtgccag agtccttcga tagggacaag 

tctactggtc tgcctaaagg tgtcgctctg 

catgccagag atcctatttt tggcaatcaa 

gttccattcc atcacggttt tggaatgttt 

15cgagtcgtct taatgtatag atttgaagaa 

aagattcaaa gtgcgctgct ggtgccaacc 

attgacaaat acgatttatc taatttacac 

aaggaagtcg gggaagcggfc tgccaagagg 

gggctcactg agactacatc agctattctg 

2 0gcggtcggta aagttgttcc attttttgaa 

acgctgggcg ttaatcaaag aggcgaactg 

tatgtaaaca atccggaagc gaccaacgcc 

ggagacatag cttactggga. cgaagacgaa 

ctgattaagt acaaaggcta tcaggtggct 

2 5caccccaaca tcttcgacgc aggtgtcgca 

cccgccgccg ttgttgtttfc. ggagcacgga 

tacgtcgcca gtcaagtaac aaccgcgaaa 

gaagtaccga aaggtcttao cggaaaactc 

aaggccaaga agggcggaaa gatcgccgtg 

30 



32 

ccggcgccat tctatccgct ggaagatgga 60 
aagagatacg ccctggttcc tggaacaatt 12 0 
atcacttacg ctgagtactt cgaaatgtcc 180 
gggctgaata caaatcacag aatcgtcgta 240 
ccggtgttgg gcgcgttatt tatcggagtt 300 
gaacgtgaat tgctcaacag tatgggcatt 360 
aaggggttgc aaaaaatttt gaacgtgcaa 420 
atcatggatt ctaaaacgga ttaccaggga 480 
catctacctc ccggttttaa tgaatacgat 540 
acaattgcac tgatcatgaa ctcctctgga 600 
cctcatagaa ctgcctgcgt gagattctcg 660 
atcattccgg atactgcgat tttaagtgtt 720 
actacactcg gatatttgat atgtggattt 780 
gagctgtttc tgaggagcct tcaggattac 84 0 
ctattctcct tcttcgccaa aagcactctg 900 
gaaattgctt ctggtggcgc tcccctctct 960 
ttccatctgc caggtatcag gcaaggatat 1020 
attacacccg agggggatga taaaccgggc 1080 
gcgaaggttg tggatctgga taccgggaaa 114 0 
tgtgtgagag gtcctatgat tatgtccggt 1200 
ttgattgaca aggatggatg gctacattct 1260 
cacttcttca tcgttgaccg cctgaagtct 132 0 
cccgctgaat tggaatccat cttgctccaa 1380 
ggtcttcccg acgatgacgc cggtgaactt 144 0 
aagacgatga cggaaaaaga gatcgtggat 1500 
aagttgcgcg gaggagttgt gtttgtggac 1560 
gacgcaagaa aaatcagaga gatcctcata 162 0 
taa 1653 



<210> 44 

<211> 1369 

<212> DNA 

35<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 



40<400> 44 

ggatccgttt gcgtattggg cgctcttccg 

acctctgaaa gaggaacttg gttagctacc 

qtatqtcagt taqqqtqtqci aaagtcccca 



ctgatctgcg cagcaccatg gcctgaaata 60 
ttctgaggcg gaaagaacca gctgtggaat 120 
qqctccccaq caggcagaag tatqcaaaqc- 180 
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atgcatctca 


attagtcagc 


aaccaggtgt 


QcraaaQtccc 


caggctcccc 


agcaggcaga 


240 


aotatcfcaaa 


geatgeatet 


caattagtca 


gcaaccatag 


tcccgcccct 


aactccgccc 


300 


atcccacccc 


taactccgcc 


cagttccgcc 


cattctccgc 


cccatggctg 


actaattttt 


360 


fcfctatttata 


cagaggcega 


ggccgcctct 


erect ctqacic 


tattccagaa gtagtgagga 


420 


5aacfc thttta 

_? y y W l» l> l> UkVf^j 


cracra c c t a era 


cttttcrcaaa 


aagctcgatt 


cttctgacac 


tagcgccacc 


480 


a t* cf a f* pcra a p 


a a oa r* aac p t* 


pea t" a c fccrac 


agtcccgcag 


cttgggtcga 


acgcttgttc 


540 


ncrrfh a pcra c t~ 
yy y Lciuy l- 


crcrcr c p> p a ctp a 


aacratccfaa 

y cl v., a Lcvjy a 


tertaefcaata 


cggccgtgtt 


ccgtctaagc 


600 




yy v_)Wy y v— « i_ 


at" t*pcit*naacf 


arcaacctaa 


qcqqcqccct 


gaacgagctt 


660 


^aaij QV— y ciy y 


v_« *»y w >-* w y w v« 


cracfctaactcr 
y c*y w w^j y \* i»-y 


accaccaccQ 


y v_y uci^i>ul_| 


»-» y v*> v-« y v-< i-y v- y 


720 


1 flht-crcrat-cii-'ha 

1UL l-yy CI L.y t, l—y 


t" era p r*era a ctp 


pctctpp crcrcr a r* 
^yy^^yyyav* 


uyy x> w y v> *»y v-* 


tgggcgaggt 


ccctggccag 


780 


yciLU uy i— y ci 


y V-# Ciy *-» WMWVi fc» 


Uy Vr<u'\iiVV^V< 1— 


oacraaacitt t 


ctatcatggc 


egatgeaatg 


840 


p rro r* o p c* 1~ fr p 

yy y y 


a \_> cl<-> l. y y ci 


v_ Vw- y ^ l. ca v_ v^- 


i»y ^* w w l— _^^_<y 


accaccaggc 


taagcategg 


900 


ategagegtg 


ctcggacccg 


catggaggee 


ggcctggtgg 


accaggacga 


cctggacgag 


_7 O U 


gagcatcagg 


gcctggcccc 


cgctgaactg 


ttcgcccgac 


tgaaagcccg 


catgccggac 


1020 


15ggtgaggacc 


tggttgtcac 


acaeggagat 


gcctgcctcc 


ctaacatcat 


ggtcgagaat 


1080 


ggccgcttct 


ccggcttcat 


egactgeggt 


cgectaggag 


ttgccgaccg 


ctaccaggac 


1140 


atcgccctgg 


ccacccgcga 


categctgag 


gagcttggcg 


gcgagtgggc 


cgaccgcttc 


1200 


ttagtcttgt 


acggcatcgc 


agctcccgac 


agccagcgca 


tcgccttcta 


ccgcttgctc 


1260 


gacgagttct 


tttaatgatc 


tagaaceggt 


catggccgca 


ataaaatatc 


tttattttca 


1320 


20ttacatctgt 


gtgttggttt 


tttgtgtgtt 


cgaactagat 


gctgtcgac 




1369 



<210> 45 
<211> 1214 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 



30<400> 45 

gcggccgcaa atgetaaace actgcagtgg 
ctcagcgatc tgectattte gttegtccat 
tacgattcgt gagggcttac catcaggccc 
ttcaccggcc cccgatttgt cagcaatgaa 

3 5tggtcctgct actttgtccg cctccatcca 
aagaagttcg ccagtgagta gtttccgaag" 
atcacgctcg tcgttcggta tggcttcgtt 
cacatgatca cccatattat gaagaaatgc 
cagaagtaag ttggccgcgg tgttgtcgct 

40taccgtcatg ccatccgtaa gatgetttte 
ttgtgagtag tgtataegge gaccaagctg* 
cgcgccacat agcagtactt tgaaagtgct 
aqactcaaqq atcttgccgc tattgagatc 



ttaccagtgc ttgatcagtg aggcaccgat 60 
agtggcctga ctccccgtcg tgtagatcac 120 
cagcgcagca atgatgeege gagagecgeg 180 
ccagccagca gggagggccg agegaagaag 240 
gtctatgagc tgctgtcgtg atgctagagt 300 
agttgtggcc attgetactg gcatcgtggt 360 
caactctggt tcccagcggt caagcegggt 420 
agtcagctcc ttagggcetc cgatcgttgt 480 
catggtaatg gcagcactac acaattctct 54 0 
cgtgaccggc gagtactcaa ecaagtegtt 600 
ctcttgcccg gegtctatae gggacaacac 660 
catcateggg aatcgttctt eggggeggaa 720 
cagttcgata tagcccactc ttgcacccag 780 
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ttgatcttca gcatctttta ctttcaccag 
aaatgccgca aagaagggaa tgagtgcgac 
ttttcaatat gtttgcagca tttgtcaggg 
atcgccacca tgtctaggta ggtagtaaac 
Sagtccagcct tgagttggtt gagtccaagt 
tgagggttga gtccaagtca cgtttggaga 
agctagcggc ctcggcggcc gaattcttgc 
taaagccacc atgg 



34 

cgtttcgggg tgtgcaaaaa caggcaagca 840 

acgaaaatgt tggatgctca tactcttcct 900 

ttactagtac gtctctcttg agagaccgcg 960 

gaaagggctt aaaggcctaa gtggccctcg 1020 

cacgtttgga gatctggtac cttacgcgta 1080 

tctggtacct tacgcgtatg agctctacgt 1140 

gttcgaagct tggcaatccg gtactgttgg 1200 

1214 



10<210> 46 
<211> 1522 
<212> DNA 

<213> Artificial Sequence 



15<220> 

<223> A synthetic construct. 



<400> 46 

gcggccgcaa atgctaaacc actgcagtgg 

20ctcagcgatc tgcctatttc gttcgtccat 
tacgattcgt gagggcttac catcaggccc 
ttcaccggcc cccgatttgt cagcaatgaa 
tggtcctgct actttgtccg cctccatcca 
aagaagttcg ccagtgagta gtttccgaag 

25atcacgctcg tcgttcggta tggcttcgtt 
cacatgatca cccatattat gaagaaatgc 
cagaagtaag ttggccgcgg tgttgtcgct 
taccgtcatg ccatccgtaa gatgcttttc 
ttgtgagtag tgtatacggc gaccaagctg 

3 0cgcgccacat agcagtactt tgaaagtgct 
agactcaagg atcttgccgc tattgagatc 
ttgatcttca gcatctttta ctttcaccag 
aaatgccgca aagaagggaa tgagtgcgac 
ttttcaatat gtttgcagca tttgtcaggg 

35atacacagtg actcatactt tcaccaatac 
tagaagtgaa ttatttatga ggttgtctta 
ttgtaatgta ttttgtgtga tacccagagg 
gaagtccaca attcacagtc ctgaactata 
gcagtataat ttcagtgctt ttaaattttg 

40gtttgatatg cgtgcacaga atggggcttc 
ccaccatgtc taggtaggta gtaaacgaaa 
cagccttgag ttggttgagt ccaagtcacg 
ctctacgtag ctagcggcct cggcggccga 



ttaccagtgc ttgatcagtg aggcaccgat 60 
agtggcctga ctccccgtcg tgtagatcac 120 
cagcgcagca atgatgccgc gagagccgcg 180 
ccagccagca gggagggccg agcgaagaag 240 
gtctatgagc tgctgtcgtg atgctagagt 300 
agttgtggcc attgctactg gcatcgtggt 360 
caactctggt tcccagcggt caagccgggt 420 
agtcagctcc ttagggcctc cgatcgttgt 480 
catggtaatg gcagcactac acaattctct 540 
cgtgaccggc gagtactcaa ccaagtcgtt 600 
ctcttgcccg gcgtctatac gggacaacac 660 
catcatcggg aatcgttctt cggggcggaa 720 
cagttcgata tagcccactc ttgcacccag 780 
cgtttcgggg tgtgcaaaaa caggcaagca 840 
acgaaaatgt tggatgctca tactcttcct 900 
ttactagtac gtctctcaag agatttgtgc 960 
tttgcatttt ggataaatac tagacaactt 1020 
aaattaaaaa ttacaaagta ataaatcaca 1080 
tttaaggcaa cctattactc ttatgctcct 1140 
atcttatctt tgtgattgct gagcaaattt 1200 
tcctgcttac tattttcctt ttttatttgg 1260 
tattaaaata ttcttgagag accgcgatcg 1320 
gggcttaaag gcctaagtgg ccctcgagtc 1380 
tttggagatc tggtacctta cgcgtatgag 1440 
attcttgcgt tcgaagcttg gcaatccggt 1500 
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actgttggta aagccaccat gg 

<210> 47 
<211> 1134 
5<212> DNA 
<213> Artificial Sequence 



1522 



<220> 

<223> A synthetic construct. 

10 

<400> 47 

gcggccgcaa atgctaaacc actgcagtgg ttaccagtgc ttgatcagtg aggcaccgat 60 
ctcagcgatc tgcctatttc gttcgtccat agtggcctga ctccccgtcg tgtagatcac 12 0 
tacgattcgt gagggcttac catcaggccc cagcgcagca atgatgccgc gagagccgcg 180 

15ttcaccggcc cccgatttgt cagcaatgaa ccagccagca gggagggccg agcgaagaag 240 
tggtcctgct actttgtccg cctccatcca gtctatgagc tgctgtcgtg atgctagagt 300 
aagaagttcg ccagtgagta gtttccgaag agttgtggcc attgctactg gcatcgtggt 3 60 
atcacgctcg tcgttcggta tggcttcgtt caactctggt tcccagcggt caagccgggt 420 
cacatgatca cccatattat gaagaaatgc agtcagctcc ttagggcctc cgatcgttgt 4 80 

20cagaagtaag ttggccgcgg tgttgtcgct catggtaatg gcagcactac acaafctctct 54 0 
taccgtcatg ccatccgtaa gatgcttttc cgtgaccggc gagtactcaa ccaagtcgtt 600 
ttgtgagtag tgtatacggc gaccaagctg ctcttgcccg gcgtctatac gggacaacac 660 
cgcgccacat agcagtactt tgaaagtgct catcatcggg aatcgttctt cggggcggaa 720 
agactcaagg atcttgccgc tattgagatc cagttcgata tagcccactc ttgcacccag 780 

25ttgatcttca gcatctttta ctttcaccag cgtttcgggg tgtgcaaaaa caggcaagca 840 
aaatgccgca aagaagggaa tgagtgcgac acgaaaatgt tggatgctca tactcgtcct 900 
ttttcaatat tattgaagca tttatcaggg ttactagtac gtctctcaag agatttgtgc 960 
atacacagtg actcatactt tcaccaatac tttgcatttt ggataaatac tagacaactt 1020 
tagaagtgaa ttatttatga ggttgtctta aaattaaaaa ttacaaagta ataaatcaca 1080 

30ttgtaatgta ttttgtgtga tacccagagg tttaaggcaa cctattactc ttat 1134 

<210> 48 
<211> 319 
<212> DNA 
35<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
40<400> 48 

actagtacgt ctctcaagga taagtaagta atattaaggt acgggaggta cttggagcgg 60 
ccgcaataaa atatctttat tttcattaca tctgtgtgtt ggttttttgt gtgaatcgat 120 
aqtactaaca tacqctctcc atcaaaacaa aacqaaacaa aacaaactaq caaaataqqc 180 
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tgtccccagt gcaagtgcag gtgccagaac 
ctcgctagcc tcgaggatat cagatctggc 
gttggtaaag ccaccatgg 



36 

atttctctgg cctaagtggc cggtaccgag 24 0 
ctcggcggcc aagcttggca atccggtact 300 

319 



5<210> 49 
<211> 320 
<212> DNA 

<213> Artificial Sequence 



10<220> 

<223> A synthetic construct. 



<400> 49 

actagtacgt ctctcaagga taagtaagta 
IScgcaataaaa tatctttatt ttcattacat 
gtactaacat acgctctcca tcaaaacaaa 
gtccccagtg caagtgcagg tgccagaaca 
ctcgctagcc tcgaggatat caagatctgg 
tgttggtaaa gccaccatgg 

20 

<210> 50 
<211> 5 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<22 3> A synthetic construct. 



atattaaggt acgggaggta ttggacaggc 60 
ctgtgtgttg gttttttgtg tgaatcgata 120 
acgaaacaaa acaaactagc aaaataggct 180 
tttctctggc ctaactggcc ggtacctgag 240 
cctcggcggc caagcttggc aatccggtac 300 

320 



<400> 50 

30tataa 5 

<210> 51 
<211> 6 
<212> DNA 
35<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 



40<400> 51 
stratg 



6 
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<210> 52 
<211> 9 
<212> DNA 
5<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

10<220> 

<221> misc_feature 

<222> 4, 6, 7 

<223> n = A, T, C or G 

15<400> 52 

mttncnnma 9 

<210> 53 
<211> 5 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<22 3> A synthetic construct. 

25 

<400> 53 

tratg 5 

<210> 54 
30<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> A synthetic construct. 
<400> 54 

gtactgagac gacgccagcc caagcttagg cctgagtg 38 

40<210> 55 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> A synthetic construct. 
<400> 55 

Sggcatgagcg tgaactgact gaactagcgg ccgccgag 38 

<210> 56 
<211> 24 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
15<400> 56 

ggatcccatg gtgaagcgtg agaa 24 

<210> 57 

<211> 21 

20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

25 

<400> 57 

ggatcccatg gtgaaacgcg a 21 



<210> 58 
30<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> A synthetic construct. 
<400> 58 

ctagcttttt tttctagata atcatgaaga c 

40<210> 59 
<211> 32 
<212> DNA 

<213> Artificial Sequence 



31 
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<220> 



<223> A synthetic construct. 



<400> 59 



Sgcgtagccat ggtaaagcgt gagaaaaatg tc 



32 



<210> 60 
<211> 33 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
15<400> 60 

ccgactctag attactaacc gccggccttc acc 3 3 

<210> 61 
<211> 54 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

25 

<400> 61 

caaaaagctt ggcattccgg tactgttggt aaagccacca tggtgaagcg agag 54 

<210> 62 
30<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> A synthetic construct. 



<400> 62 



caattgttgt tgttaacttg tttatt 



26 



40<210> 63 



<400> 63 
000 



WO 2006/034061 



PCTYUS2005/033218 



40 



<210> 



64 



<400> 



64 



000 



5 



<210> 65 
<211> 10 
<212> DNA 

<213> Artificial Sequence 

10 

<220> 

<223> A synthetic construct. 
<400> 65 

IScaccatggct 10 

<210> 66 
<211> 40 
<212> DNA 
20<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
25<400> 66 

aaccatggct tccaaggtgt acgaccccga gcaacgcaaa 40 

<210> 67 
<211> 40 
30<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

35 

<400> 67 

gctctagaat tactgctcgt tcttcagcac gcgctccacg 40 

<210> 68 
40<211> 31 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> A synthetic construct. 
<400> 68 

Scgctagccat ggcttcgaaa gtttatgatc c 31 

<210> 69 
<211> 25 
<212> DNA 
10<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 
15<400> 69 

ggccagtaac tctagaatta ttgtt 25 

<210> 70 
<211> 1092 
20<212> DNA 

<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 

25 

<400> 70 

aagcttgcta gcgccaccat gaagaagccc gagctcaccg ctaccagcgt tgaaaaattt 60 

ctcatcgaga agttcgacag tgtgagcgac ctgatgcagt tgtcggaggg cgaagagagc 12 0 

cgagccttca gcttcgatgt cggcggacgc ggctatgtac tgcgggtgaa tagctgcgct 180 

30gatggcttct acaaagaccg ctacgtgtac cgccacttcg ccagcgctgc actacccatc 240 

cccgaagtgt tggacatcgg cgagttcagc gagagcctga catactgcat cagtagacgc 300 

gcccaaggcg ttactctcca agacctcccc gaaacagagc tgcctgctgt gttacagcct 360 

gtcgccgaag ctatggatgc tattgccgcc gccgacctca gtcaaaccag cggcttcggc 42 0 

ccattcgggc cccaaggcat cggccagtac acaacctggc gggatttcat ttgcgccatt 4 80 

35gctgatcccc atgtctacca ctggcagacc gtgatggacg acaccgtgtc cgccagcgta 540 

gctcaagccc tggacgaact gatgctgtgg gccgaagact gtcccgaggt gcgccacctc 60 0 

gtccatgccg acttcggcag caacaacgtc ctgaccgaca acggccgcat caccgccgta 660 

atcgactggt ccgaagctat gttcggggac agtcagtacg aggtggccaa catcttcttc 720 

tggcggccct ggctggcttg catggagcag cagactcgct acttcgagcg ccggcatccc 780 

40gagctggccg gcagccctcg tctgcgagcc tacatgctgc gcatcggcct ggatcagctc 840 

taccagagcc tcgtggacgg caacttcgac gatgctgcct gggctcaagg ccgctgcgat 900 

gccatcgtcc gcagcggggc cggcaccgtc ggtcgcacac aaatcgctcg ccggagcgcc 960 

qccgtatgga ccgacggctg cgtcgaggtg ctggccgaca gcggcaaccg ccggcccagt 1020 
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acacgaccgc gcgctaagga gggtggcgga gggagcggtg gcggaggttc ctacgtatag 1080 
tctagactcg ag 1092 



<210> 71 
5<211> IL093 
<212> DNA 

<213> Artificial Sequence 



<220> 

10<223> A synthetic construct. 



<400> VI 



aagcttcgcta 


gcgccaccat 


gaagaagccc 


gagctcaccg 


ctaccagcgt 


tgaaaaattt 


60 


ctcatcgaga 


agttcgacag 


tgtgagcgac 


ctgatgcagt 


tgtcggaggg 


cgaagagagc 


120 


IScgagccttca 


gcttcgatgt 


cggcggacgc 


ggctatgtac 


tgcgggtgaa 


tagctgcgct 


180 


gatggcfctct 


acaaagaccg 


ctacgtgtac 


cgccacttcg 


ccagcgctgc 


actacccatc 


240 


cccgaacftgt 


tggacatcgg 


cgagttcagc 


gagagcctga 


catactgcat 


cagtagacgc 


300 


gcccaacfgcg 


ttactctcca 


agacctcccc 


gaaacagagc 


tgcctgctgt 


gttacagcct 


360 


gtcgccgaag 


ctatggatgc 


tattgccgcc 


gccgacctca 


gtcaaaccag 


cggcttcggc 


420 


20ccattcgggc 


cccaaggcat 


cggccagtac 


acaacctggc 


gggatttcat 


ttgcgccatt 


480 


gctgatcccc 


atgtctacca 


ctggcagacc 


gtgatggacg 


acaccgtgtc 


cgccagcgta 


540 


gctcaacjccc 


tggacgaact 


gatgctgtgg 


gccgaagact 


gtcccgaggt 


gcgccacctc 


600 


gtccatgccg 


acttcggcag 


caacaacgtc 


ctgaccgaca 


acggccgcat 


caccgccgta 


660 


atcgactggt 


ccgaagctat 


gttcggggac 


agtcagtacg 


aggtggccaa 


catcttcttc 


720 


25tggcggccct 


ggctggcttg 


catggagcag 


cagactcgct 


acttcgagcg 


ccggcatccc 


780 


gagctggccg 


gcagccctcg 


tctgcgagcc 


tacatgctgc 


gcatcggcct 


ggatcagctc 


840 


taccagagcc 


tcgtggacgg 


caacttcgac 


gatgctgcct 


gggctcaagg 


ccgctgcgat 


900 


gccatcgtcc 


gcagcggggc 


cggcaccgtc 


ggtcgcacac 


aaatcgctcg 


ccggagcgca 


960 


gccgtatgga 


ccgacggctg 


cgtcgaggtg 


ctggccgaca 


gcggcaaccg 


ccggcccagt 


1020 


3 0acacgaccgc 


gcgctaagga 


aggcggtgga 


ggtagtggtg gcggaggtag 


ctacgtataa 


1080 


ctctagactc 


gag 










1093 



<210> 72 
<211> 813 
35<212> DNA 

<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 

i 

<400> 72 

gctagccjcca ccatgatcga acaagacggc ctccatgctg gcagtcccgc agcttgggtc 60 
gaacgcttgt tcgggtacga ctgggcccag cagaccatcg gatgtagcga tgcggccgtg 120 
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i« w\_r^ w- \^ i« a. a 


acac t~ raaacr 
y **y 


c ccf ac c ccr t a 


ctcrttccrtcia 


agaccgacct 




180 


v *-* 33 a. c«. y 




aCTCtacccac 


ctgagctggc 


tggccaccac 


cggtgtaccc 


240 


^y *-*y ^-*-y *-y 


tat* t"craatat 


fccr terse ccraa 


CJCCQQCCQQQ 


actaactQct 


actocicicqac* 


300 


at* ^*^*^ , 1"Cfc^f , ^ , 
y « ^ y ^ 


artna trhnrt* 
"y y u> w io 


aacicacrccac 


cttgcccccg 


ctgagaaggt 


ttccatcatg 


360 


Sgccgatgcaa 


tgcggcgcct 


gcacaccctg 


yaCCCCyCUd 


—5 j*i « « m 4— 

CatyCCCCCC 


cgaccaccdy 


*E Z U 


gctaagcatc 


ggatcgagcg 


tgctcggacc 


cgcatggagg 


ccggcctggt 


ggaccaggac 


480 


gacctggacg 


aggagcatca. 


gggcctggcc 


cccgctgaac 


tgttcgcccg 


cctgaaagcc 


540 


cgcatgccgg 


acggtgagga. 


cctggttgtg 


acacatggtg 


atgcctgcct 


ccctaacatc 


600 


atggtcgaga 


atggccgctt 


ctccggcttc 


atcgactgcg 


gtcgcctagg 


agttgccgac 


660 


lOcgctaccagg 


acatcgccct 


ggccacccgc 


gacatcgctg 


aggagcttgg 


cggcgagtgg 


720 


gccgaccgct 


tcttagtctt 


gtacggcatc 


gcagctcccg 


acagccagcg 


catcgccttc 


780 


taccgcctgc 


tcgacgagtt 


cttttaatct 


aga 






813 



<210> 73 
15<211> 816 
<212> DNA 

<213> Artificial Sequence 
<220> 

20<223> A synthetic construct. 



<400> 73 



gctagcgcca 


ccatgatcga. 


acaagacggc 


ctccatgctg 


gcagtcccgc 


agcttgggtc 


60 


gaacgcttgt 


tcgggtacga. 


ctgggcccag 


cagaccatcg 


gatgtagcga 


tgcggccgtg 


12 0 


25ttccgtctaa 


gcgctcaagcj 


ccggcccgtg 


ctgttcgtga 


agaccgacct 


gagcggcgcc 


180 


ctgaacgagc 


ttcaagacga 


ggctgcccgc 


ctgagctggc 


tggccaccac 


cggcgtaccc 


240 


tgcgccgctg 


tgttggatgt 


tgtgaccgaa 


gccggccggg 


actggctgct 


gctgggcgag 


300 


gtccctggcc 


aggatctgct 


gagcagccac 


cttgcccccg 


ctgagaaggt 


ttctatcatg 


360 


gccgatgcaa 


tgcggcgcct 


gcacaccctg 


gaccccgcta 


cctgcccctt 


cgaccaccag 


420 


30gctaagcatc 


ggatcgagcg 


tgctcggacc 


cgcatggagg 


ccggcctggt 


ggaccaggac 


480 


gacctggacg 


aggagcatca. 


gggcctggcc 


cccgctgaac 


tgttcgcccg 


actgaaagcc 


540 


cgcatgccgg 


acggtgagga 


cctggttgtc 


acacacggag 


atgcctgcct 


ccctaacatc 


600 


atggtcgaga 


atggccgctt 


ctccggcttc 


atcgactgcg 


gtcgcctagg 


agttgccgac 


660 


cgctaccagg 


acatcgccct 


ggccacccgc 


gacatcgctg 


aggagcttgg 


cggcgagtgg 


720 


35gccgaccgct 


tcttagtctt 


gtacggcatc 


gcagctcccg 


acagccagcg 


catcgccttc 


780 


taccgcttgc 


tcgacgagtt 


cttttaatga 


tctaga 






816 



<210> 74 
<211> 1252 
40<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> A synthetic construct. 



<400> 74 



ogcggccy cdd 


dLyLLdddLL 


ct^> ^y *» 01 y *— ^dy 


L L d. V^. CLy 


t taa tcacitcr 


acracaccaat 


60 


Cucagcy ate 


4- r pt"ai"t't*p 

LyUL L.CIL.UUL 


of* fr~ pnfpp 4- 

y L L.L<^LLU CX. L 




rtrrcrcitco 


tcrt"aaatcac 


120 


tacgattcgt 


gagggc 1 1 ac 


caLCdygc cc 




duyct i»y t^v—y 


yayayuuyuy 


180 


ttcaccggcc 


/"r ^ ^ *- +- 

cccyattuyL 


LayLddLy eld 


l. l. dy l l dy l ci 


/T/~f pf ci cine en 

yyyciyyy^uy 


dy uyaay da.y 


24 0 


tggtcctgct 


—-k i* 4* 4* /--f ^ m 

actcugtccy 


CCUCCatCCa 


gLCtotyoyc 


uy^i«yL>u>yi.y 


atyL uayay u 


30 0 


*1 *%. *^ *^ /i 4" 

!Uda9 aa^ ULCy 


tCay Lyay Ld 


yuu LLLy ct cty 


ay l L.y L.yy ll 


a4»4>f»p4-apt-fT 


arah rri^crcf t" 
y^QLuy Lyy l« 


360 


atCacgctcg 


tcgttcggca 


LyyLLLL.y i— u 


UaaULULyy l 


t* rrrp of^nrTt - 
LUUVadyoyy i« 


c a a a r* f" cr a a t 
v^aay^uyyy i- 


420 


ca.ca.tga.tca 


cccatattat 


gaagaaat gc 


agtcagctcc 


4- 4- a r~tnrt c* /-» 
l. L- ciy yyLLUL 


r>na 4- r^rrf" 4- f-r 4- 
LyctLL.yLi.yL 


480 


cagaagtaag 


uxggccgcgg 


V" y— r 4~ 4* 4~ ^^^r 4— 

tgety UCy c l 


cauygcaauy 


rtnartpa /-^ 4- o 
yOayLdLLaL 


dLddLLL LL L. 




taccgtcatg 


ccatccgtaa 


gatgetttte 


cgtgaccggc 


gagtactcaa 


ecaagtegtt 


600 


15ttgtgagtag 


tgtatacggc 


gaccaagctg 


ctcttgcccg 


gegtctatae 


gggacaacac 


660 


cgcgccacat 


agcagtactt 


tgaaagtgct 


catcateggg 


aatcgttctt 


eggggeggaa 


720 


agactcaagg 


atcttgccgc 


tattgagatc 


cagttcgata 


tagcccactc 


ttgcacccag 


780 


ttgatcttca 


gcatctttta 


ctttcacc ag 


cgtttcgggg 


tgtgcaaaaa 


caggcaagca 


PAH 
Oft U 


aaatgccgca 


aagaagggaa 


tgagtgcgac 


acgaaaatgt 


tggatgetea 


tactcgtcct 


900 


20ttttcaatat 


tattgaagca 


tttatcaggg 


ttactagtac 


gtctctcaag 


gataagtaag 


960 


taatattaag 


gtacgggagg 


tattggac ag 


geegcaataa 


aauaccuuca 


LLLLCatLdC 


1 O 0 Pi 


atctgtgtgt 


tggttttttg 


tgtgaatcga 


tagtactaac 


atacgctctc 


catcaaaaca 


1 AQA 


aaacgaaaca 


aaacaaacta 


gcaaaatagg 


ctgtccccag 


tgcaagtgca 


ggtgccagaa 


JLX*± U 


catttctctg 


gcctaactgg 


ccggtacc tg 


agetegctag 


cctcgaggat 


atcaagatct 


1200 


25ggcctcggcg 


gccaagcttg 


geaatceggt 


actgttggta 


aagccaccat 


gg 


1252 


<210> 75 














<400> 75 














30 000 














<210> 76 














<211> 228 














<212> DNA 














35<213> Artificial Sequence 










<220> 














<223> A synthetic construct. 










40<400> 76 














actagtcgtc 


tctcttgaga 


gaccgcgatc 


gccaccatga 


taagtaagta 


atattaaata 


60 


agtaaggcct 


gagtggccct 


cgagccagcc 


ttgagttggt 


tgagtccaag 


teaegtctgg 


120 


aqatctQQta 


cctacQccrtq 


apctctacQt 


acrctacrcQqc 


ctcggcggcc 


gaattcttgc 


180 
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gatctaagta agcttggcat tccggtactg ttggtaaagc caccatgg 228 

<210> 77 
<211> 228 
5<212> DNA 
<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

10 

<400> 77 

actagtacgt ctctcttgag agaccgcgat cgccaccatg ataagtaagt aatattaaat 60 
aagtaaggcc tgagtggccc tcgagtccag ccttgagttg gfctgagtcca agtcacgtct 120 
ggagatctgg taccttacgc gtagagctct acgtagctag cggcctcggc ggccgaattc 18 0 
ISttgcgatcta agcttggcaa tccggtactg ttggtaaagc caccatgg 22 8 

<210> 78 
<211> 230 
<212> DNA 
20<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
25<400> 78 

actagtacgt ctctcttgag agaccgcgat cgcatgccta ggtaggtagt attagagcat 60 
aggtagaggc ctaagtggcc ctcgagtcca gccttgagtt gcyttgagtcc aagtcacgtc 120 
tggagatctg gtaccttacg cgtatgagct ctacgtagct acjcggcctcg gcggccgaat 180 
tcttgcgatc taagcttggc aatccggtac tgttggtaaa gccaccatgg 23 0 

30 

<210> 79 
<211> 234 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> A synthetic construct. 
<400> 79 

40actagtacgt ctctcttgag agaccgcgat cgccaccatg tctaggtagg tagtaaacga 60 
aagggcttaa aggcctaagt ggccctcgag tccagccttg agttggttga gtccaagtca 120 
cgtttggaga tctggtacct tacgcgtatg agctctacgt agctagcggc ctcggcggcc 180 
gaattcttgc gatctaagct tggcaatccg gtactgttgg taaagccacc atgg 234 
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<210> 80 
<211> 938 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 80 

lOactagtaacc ctgataaatg cttcaataat 
atttccgtgt cgcccttatt cccttttttg 
cagaaacgct ggtgaaagta aaagatgctg 
tcgaactgga tctcaacagc ggtaagatcc 
caatgatgag cacttttaaa gttctgctat 

lSggcaagagca actcggtcgc cgcatacact 
cagtcacaga aaagcatctt acggatggca 
taaccatgag tgataacacc gcggccaact 
agctaaccgc ttttttgcac aacatggggg 
cggagctgaa tgaagccata ccaaacgacg 

20caacaacgtt gcgcaaacta ttaactggcg 
taatagactg gatggaggcg gataaagttg 
ctggctggtt tattgctgat aaatctggag 
cagcactggg gccagatggt aagccctccc 
aggcaactat ggatgaacga aatagacaga 

25attggtaacc actgcagtgg ttttcctttt 



attgaaaaag gaagagtatg agtattcaac 60 
cggcattttg ccttcctgtt tttgctcacc 120 
aagatcagtt gggtgcacga gtggcjttaca 180 
ttgagagttt tcgccccgaa gaacgttttc 24 0 
gtggcgcggt attatcccgt attgscgccg 300 
attctcagaa tgacttggtt gagtactcac 3 60 
tgacagtaag agaattatgc agtgctgcca 420 
tacttctgac aacgatcgga ggaccgaagg 480 
atcatgtaac tcgccttgat cgttgggaac 540 
agcgtgacac cacgatgcct gtagcaatgg 600 
aactacttac tctagcttcc cggcaacaat 660 
caggaccact tctgcgctcg gcccttccgg 72 0 
ccggtgagcg tggctctcgc ggtatcattg 780 
gtatcgtagt tatctacacg acggcggagtc 84 0 
tcgctgagat aggtgcctca ctgattaagc 900 
gcggccgc 938 



<210> 81 

<211> 938 

<212> DNA 

30<213> Artificial Sequence 



<220> 

<223> A synthetic construct. 



35<400> 81 

actagtaacc ctgataaatg ctgcaaacat 
atttccgtgt cgcactcatt cccttctttg 
ccgaaacgct ggtgaaagta aaagatgctg 
tcgaactgga tctcaatagc ggtaagatcc 

40caatgatgag cacttttaaa gttctgctat 
ggcaagagca gctcggtcgc cgcatacact 
cggtcacgga aaagcatctt acggatggca 
taaccatgag tgataacacc gcggccaact 



attgaaaaag gaagagtatg agfca-fctcaac 60 

cggcattttg cttgcctgtt tttgoacacc 120 

aagatcaact gggtgcacga gtggcjctata 180 

ttgagagttt tcgccccgaa gaacgttttc 240 

gtggcgcggt attatcccgt attgacgccg 300 

actcacagaa cgacttggtt gagtactcgc 360 

tgacagtaag agaattgtgt agtgctgcca 420 

tacttctgac aacgatcgga ggccctaagg 4 80 
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ccggcttgat cgttgggaac 540 
cacgatgcct gtagcaatgg 600 
tctagcatca cgacagcaac 660 
tctgcgctcg gcccttccgg 720 
cggctctcgc gggatcattg 780 
tatctacacg acggggagtc 840 
aggtgcctca ctgatcaagc 900 

938 

10<210> 82 
<211> 938 
<212> DNA 

<213> Artificial Sequence 
15<220> 

<223> A synthetic construct. 



agctgaccgc atttttgcac 
cggagctgaa cgaagccata 
caacaacgtt gcgcaaacta 
tcatagactg gatggaggcg 
Sctggctggtt tatagctgat 
ctgcgctggg gccagatggt 
aggcaactat ggatgaacga 
actggtagcc actgcagtgg 



aacatggggg atcatgtaac 
ccgaacgacg agcgtgacac 
ctcactggcg aacttctcac 
gataaagttg caggaccact 
aaatccggtg ccggtgaacg 
aagccctcac gaatcgtagt 
aatagacaga tcgctgagat 
tttagctttt gcggccgc 



<400> 82 

actagtaacc ctgacaaatg ctgcaaacat 
20attttcgtgt cgcactcatt cccttctttg 
ccgaaacgct ggtgaaagta aaagatgctg 
tcgaactgga tctcaatagc ggcaagatcc 
cgatgatgag cacttttaaa gttctgctat 
ggcaagagca gcttggtcgc cgtatacact 
25cggtcacgga aaagcatctt acggatggca 
ttaccatgag cgacaatacc gcggccaact 
agctgaccgc atttttgcac aacatggggg 
cggagctgaa cgaagccata ccgaacgacg 
caacaacgtt gcggaaacta ctcactggcg 
30 tcatagactg gatggaggcg gacaaagtag 
ctggctggtt cattgctgat aaatccggtg 
ctgcgctggg gcctgatggt aagccctcac 
aggccactat ggacgaacga aatagacaga 
actggtaacc actgcagtgg tttagcattt 

35 

<210> 83 
<211> 938 
<212> DNA 

<213> Artificial Sequence 

40 

<220> 

<223> A synthetic construct. 



attgaaaaag gaagagtatg agcatccaac 60 
cggcattttg cttgcctgtt tttgcacacc 120 
aagatcaact gggtgcaaga gtgggctata 180 
ttgagtcttt tcgccccgaa gaacgttttc 24 0 
gtggcgcggt gttgtcccgt atagacgccg 300 
actcacaaaa cgacttggtt gagtactcgc 360 
tgacggtaag agaattgtgt agtgctgcca 420 
tacttctgac aacgatcgga ggccctaagg 480 
atcatgtaac ccggcttgac cgctgggaac 540 
agcgtgacac cacgatgcct gtagcaatgg 600 
aacttctcac tctagcatca cgacagcagc 660 
caggaccact tcttcgctcg gccctccctg 72 0 
ccggtgaacg cggctctcgc gggatcattg 780 
gaatcgtagt aatctacacg acggggagtc 840 
tcgctgagat cggtgcctca ctgatcaagc 900 
gcggccgc 938 
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<400> 83 

actagtaacc ctgacaaatg ctgcaaacat attgaaaaag gaagagtatg agcatccaac 60 
attttcgtgt cgcactcatt cccttctttg cggcattttg cttgcctgtt tttgcacacc 12 0 
ccgaaacgct ggtgaaagta aaagatgctg aagatcaact gggtgcaaga gtgggctata 18 0 
Stcgaactgga tctcaatagc ggcaagatcc ttgagtcttt ccgccccgaa gaacgttttc 24 0 
cgatgatgag cactttcaaa gtactgctat gtggcgcggt gttgtcccgt atagacgccg 300 
ggcaagagca gcttggtcgc cgtatacact actcacaaaa cgacttggtt gagtactcgc 360 
cggtcacgga aaagcatctt acggatggca tgacggtaag agaattgtgt agtgctgcca 420 
ttaccatgag cgataatacc gcggccaact tacttctgac aacgatcgga ggccctaagg 4 80 

lOagctgaccgc atttttgcac aacatgggtg atcatgtgac ccggcttgac cgctgggaac 540 
cggagctgaa cgaagccata ccgaacgacg agcgtgacac cacgatgcct gtagcaatgg 600 
caacaactct tcggaaacta ctcactggcg aacttctcac tctagcatca cgacagcagc 660 
tcatagactg gatggaggcg gacaaagtag caggaccact tcttcgctcg gccctccctg 720 
ctggctggtt cattgctgat aaatctggag ccggtgagcg tggctctcgc ggtatcattg 780 

ISctgcgctggg gcctgatggt aagccctcac gaatcgtagt aatctacacg acggggagtc 84 0 
aggccactat ggacgaacga aatagacaga tcgctgagat cggtgcctca ctgatcaagc 90 0 
actggtaacc actgcagtgg tttagcattt gcggccgc 93 8 

<210> 84 
20<211> 938 
<212> DNA 

<213> Artificial Sequence 
<220> 

25<223> A synthetic construct. 
<400> 84 

actagtaacc ctgacaaatg 
attttcgtgt cgcactcatt 
3 0ccgaaacgct ggtgaaagta 
tcgaactgga tctcaatagc 
cgatgatgag cactttcaaa 
ggcaagagca gcttggtcgc 
cggtcacgga aaagcatctt 
3 5 ttaccatgag cgataatacc 
agctgaccgc atttttgcac 
cggagctgaa cgaagccata 
ccacaactct tcggaaacta 
tcatagactg gatggaggcg 
40ctggctggtt cattgctgac 
ctgcgctggg gcctgatggt 
aggccactat ggacgaacga 
actggtaacc actgcagtgg 



ctgcaaacat attgaaaaag 
cccttctttg cggcattttg 
aaagatgctg aagatcaact 
ggcaagatcc ttgagtcttt 
gtactgctat gtggcgcggt 
cgtatacact actcacaaaa 
acggatggca tgacggtaag 
gcggccaact tacttctgac 
aacatgggtg atcatgtgac 
ccgaacgacg agcgtgatac 
ctcactggcg aacttctcac 
gacaaagtag caggaccact 
aaatccggtg ccggtgaacg 
aagccctcac gaatcgtagt 
aatagacaga tcgctgagat 
tttagcattt gcggccgc 



gaagagtatg agcatccaac 60 
cttgcctgtt tttgcacacc 120 
gggtgcaaga gtgggctata 180 
ccgccccgaa gaacgattcc 240 
gttgtcccgt atagacgccg 300 
cgacttggtt gagtactcgc 360 
agaattgtgt agtgctgcca 420 
aacgatcgga ggccctaagg 480 
ccggcttgac cgctgggaac 540 
cacgatgcca gtagcaatgg 600 
tctagcatca cgacagcagc 660 
tcttcgctcg gccctccctg 720 
cggctctcgc ggcatcattg 780 
aatctacacg acggggagtc 840 
cggtgcctca ctgatcaagc 900 

938 
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<210> 85 

<400> 85 
000 

5 

<210> 86 

<400> 86 
000 

10 

<210> 87 

<400> 87 
000 

15 

<210> 88 
<211> 1038 
<212> DNA 

<213> Artificial Sequence 

20 

<220> 

<22 3> A synthetic construct. 



<400> 88 



25atgaagaagc 


ccgaactcac 


cgctaccagc 


gttgaaaaat 


ttctcatcga 


gaagttcgac 


60 


agtgtgagcg 


acctgatgca 


gttgtcggag 


ggcgaagaga 


gccgagcctt 


cagcttcgat 


120 


gtcggcggac 


gcggctatgt 


actgcgggtg 


aatagctgcg 


ctgatggctt 


ctacaaagac 


180 


cgctacgtgt 


accgccactt 


cgccagcgct 


gcactaccca 


tccccgaagt 


gttggacatc 


240 


ggcgagttca 


gcgagagcct 


gacatactgc 


atcagtagac 


gcgcccaagg 


cgttactctc 


300 


3 0caagacctcc 


ccgaaacaga 


gctgcctgct 


gtgttacagc 


ctgtcgccga 


agctatggat 


360 


gctattgccg 


ccgccgacct 


cagtcaaacc 


agcggcttcg 


gcccattcgg gccccaaggc 


420 


atcggccagt 


acacaacctg 


gcgggatttc 


atttgcgcca 


ttgctgatcc 


ccatgtctac 


480 


cactggcaga 


ccgtgatgga 


cgacaccgtg 


tccgccagcg 


tagctcaagc 


cctggacgaa 


540 


ctgatgctgt 


gggccgaaga 


ctgtcccgag 


gtgcgccacc 


tcgtccatgc 


cgacttcggc 


600 


35agcaacaacg 


tcctgaccga 


caacggccgc 


atcaccgccg 


taatcgactg gtccgaagct 


660 


atgttcgggg 


acagtcagta 


cgaggtggcc 


aacatcttct 


tctggcggcc 


ctggctggct 


720 


tgcatggagc 


agcagactcg 


ctacttcgag 


cgccggcatc 


ccgagctggc 


cggcagccct 


780 


cgtctgcgag 


cctacatgct 


gcgcatcggc 


ctggatcagc 


tctaccagag 


cctcgtggac 


840 


ggcaacttcg 


acgatgctgc 


ctgggctcaa 


ggccgctgcg 


atgccatcgt 


ccgcagcggg 


900 


40gccggcaccg 


tcggtcgcac 


acaaatcgct 


cgccggagcg 


cagccgtatg gaccgacggc 


960 


. tgcgtcgagg 


tgctggccga 


cagcggcaac 


cgccggccca 


gtacacgacc 


gcgcgctaag 


1020 


gaggtaggtc 


gagtttaa 
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<210> 89 

<211> 4333 

<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 89 



1 f\cTcic*ci~ aarhn 


ciccaat' a rrt* 


y ci.y l. v«y w c* 


y vw uuy cty y u 


fcafceaacrafcn 


tggcctcggc 


60 




y y w a a. u w y y 


t* a <*"* t" nt" hcrot* 

i, a.*-> u-y l>uhHv 


ciciciy Vhrf. w ci w 


fcacraaciatcrc 


caaaaacatt 


120 


ci.ciyciciyyyL.i_. 


ciy ^-y 


pharrpaptr 

l_.t_.cll~<_rL.clL.u>l_, 


yacty ctuyyy d. 


ccnccncicciPi 
inV»y^ <— >y yuya 


erf acre fccrcac 


180 


dad^ s_. L« ex L. y ci 


ciy *_«y t-- l-o v»y *— 


v»» i_ i_y y Ly 


nnra ppa h r*n 
y y ci. ^ w cc 


cctttaccga 


cgcacatatc 


240 


y a yy *~y y cn—a. 


t~i~sicc*t~z*c , cic 
L. L. Ctv_. l_. Lctoy 


l. y ciy i— a. i_ l-l. 


y ciy auy cty ^ y 


ttcggctggc 


agaagctatg 


300 


j. o ct ciy LLaLy 


ctcic*)mz\ afar 
yyi^uycLctuci^ 


aaarpalr rrrr 
ctdct \* i_. q Luy y 


ciL.t_y L.yy uy l. 


geagegagaa 


tagcttgeag 


360 




c ci t* rr t* t" nnn 
v_ y uy u ^-yyy 




yy ^y ^ yy 


ctgtggcccc 


agctaacgac 


420 


Ct l_ CCXl^Cl.cLl_.y 


ciy y v_y cty w l. 


cic t"na araar 
y L^y a.ci^»cLy 


auyyy Lidu^ci 


gccagcccac 


egtegtatte 


480 


frtrfarrpaa na 


aannrrrt"nra 

y 


n a trrtr 

cm ciy cl \* i_ 


a a rntrrra a a 

cim^y t-y wcicici 


agaagctacc 


gatcatacaa 


540 


aaoa trahra 
cxciy q uv^a Uvo 


t~ r*A t~crcf a haa 


r*A ACT AC CCfAP 
i»aciyciv»>*_<yciv_r 


taccaoaact 


tccaaagcat 


gtacaccttc 


600 


— * »3 w V* V- V» ^ ^ Vp> 


at - t" t~crr , r*Ar , f 


ccfcict tcaa c 


aaatacaach 


tcgtgcccga 


gagcttcgac 


660 


face ci a. c a a a a 


ppaf rnrrr t~ 

w ^ a. i_i^y w w t_ 


aatcatoaac 

y a u u c*. uy 


A_ci fc acrfc era c a 

ciy 1-o.y uyyvo 


gtaceggatt 


gcccaagggc 


720 


ahaaccctac 


CQcaccQcac 


cact fccrfccrfcc 


cgattcagtc 


atgcccgcga 


ccccatcttc 


780 


ggcaaccaga 


tcatccccga 


caccgctatc 


c t c aci cci t cicr 


tgecatttea 


ccacggcttc 


840 




ccacachaaa 


ctacttaatc 


fccrccfcrcfcfcfcc 

^ Z3 W I*- w 


gggtcgtgct 


catgtaccgc 


900 


2 5 1 tcga.gga.gg 


agctattctt 


gcgcagcttg 


caagactata 


agattcaatc 


tgccctgctg 


960 


gtgcccacac 


tatttagctt 


ettegctaag 


agcactctca 


tcgacaagta 


cgacctaagc 


1020 


aaCCUyCaCg 


ayaLCyCCag 


eggeggggeg 


ccy c testy Co. 


aggaggtagg 


tgaggccgtg 


1 non 
J.U O U 


gccaaacgct 


tccacctacc 


aggcatccgc 


cagggctacg 


gectgacaga 


aacaaccagc 


1140 


gecattctga 


tcacccccga 


aggggacgac 


aagcctggcg 


cagtaggcaa ggtggtgccc 


1200 


30ttcttcgagg 


ctaaggtggt 


ggacttggac 


aceggtaaga 


cactgggtgt 


gaaccagcgc 


1260 


ggcgagctgt 


gcgtccgtgg 


ccccatgatc 


atgagegget 


acgttaacaa 


ccccgaggct 


1320 


acaaacgctc 


tcatcgacaa 


ggacggctgg 


ctgcacagcg 


gcgacatcgc 


ctactgggac 


13 80 


gaggacgagc 


acttcttcat 


cgtggaccgg 


ctgaagagee 


tgatcaaata 


caagggctac 


1440 


caggtagccc 


cagccgaact 


ggagagcatc 


ctgctgcaac 


accccaacat 


cttcgacgcc 


1500 


35ggggtcgccg 


gcctgcccga 


cgacgatgcc 


ggcgagctgc 


ccgccgcagt 


cgtcgtgctg 


1560 


gaacacggta 


aaaccatgac 


cgagaaggag 


atcgtggact 


atgtggccag 


ccaggttaca 


1620 


accgccaaga 


agetgegegg 


tggtgttgtg 


ttcgtggacg 


aggtgcctaa 


aggactgacc 


1680 


ggcaagttgg 


acgcccgcaa 


gatccgegag 


attctcatta 


aggecaagaa 


gggcggcaag 


1740 


atcgccgtgt 


aataattcta 


gagtegggge 


ggccggccgc 


ttcgagcaga 


catgataaga 


1800 


40tacattgatg 


agtttggaca 


aaccacaact 


agaatgeagt 


gaaaaaaatg 


ctttatttgt 


1860 


gaaatttgtg 


atgctattgc 


tttatttgta 


accattataa 


getgeaataa 


acaagttaac 


1920 


aacaacaatt 


gcattcattt 


tatgtttcag 


gttcaggggg 


aggtgtggga 


ggttttttaa 


1980 


agcaagtaaa 


acctctacaa 


atgtggtaaa 


atcgataagg 


atccgtcgac 


cgatgccctt 


2040 
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gagagccttc aacccagtca gctccttccg 
acttatgact gtcttcttta tcatgcaact 
cttcctcgct cactgactcg ctgcgctcgg 
actcaaaggc ggtaatacgg ttatccacag 
Sgagcaaaagg ccagcaaaag gccaggaacc 
ataggctccg cccccctgac gagcatcaca 
acccgacagg actataaaga taccaggcgt 
ctgttccgac cctgccgctt accggatacc 
cgctttctca tagctcacgc tgtaggtatc 

lOtgggctgtgt gcacgaaccc cccgttcagc 
gtcttgagtc caacccggta agacacgact 
ggattagcag agcgaggtat gtaggcggtg 
acggctacac tagaagaaca gtatttggta 
gaaaaagagt tggtagctct tgatccggca 

ISttgtttgcaa gcagcagatt acgcgcagaa 
tttctacggg gtctgacgct cagtggaacg 
gattatcaaa aaggatcttc acctagatcc 
tctaaagtat atatgagtaa acttggtctg 
tggttaccag tgcttgatca gtgaggcacc 

2 0catagtggcc tgactccccg tcgtgtagat 
ccccagcgca gcaatgatgc cgcgagagcc 
gaaccagcca gcagggaggg ccgagcgaag 
ccagtctatg agctgctgtc gtgatgctag 
aagagttgtg gccattgcta ctggcatcgt 

25gttcaactct ggttcccagc ggtcaagccg 
tgcagtcagc tccttagggc ctccgatcgt 
gctcatggta atggcagcac tacacaattc 
ttccgtgacc ggcgagtact caaccaagtc 
ctgctcttgc ccggcgtcta tacgggacaa 

30gctcatcatc gggaatcgtt cttcggggcg 
atccagttcg atatagccca ctcttgcacc 
cagcgtttcg gggtgtgcaa aaacaggcaa 
gacacgaaaa tgttggatgc tcatactcgt 
gggttactag tacgtctctc aaggataagt 

35caggccgcaa taaaatatct ttattttcat 
cgatagtact aacatacgct ctccatcaaa 
aggctgtccc cagtgcaagt gcaggtgcca 
ggaggtattg gacaggccgc aataaaatat 
ttttgtgtga ate 



51 

gtgggcgcgg ggcatgacta tcgtcgccgc 2100 
cgtaggacag gtgccggcag cgctcttccg 2160 
tegttegget gcggcgagcg gtatcagctc 222 0 
aatcagggga taacgcagga aagaacatgt 2280 
gtaaaaaggc cgcgttgctg gcgtttttcc 2340 
aaaatcgacg ctcaagtcag aggtggcgaa 2400 
ttccccctgg aagctccctc gtgcgctctc 2460 
tgtccgcctt tctcccttcg ggaagcgtgg 2 52 0 
tcagttcggt gtaggtcgtt cgctccaagc 2 580 
ccgaccgctg cgccttatcc ggtaactatc 2640 
tatcgccact ggcagcagcc actggtaaca 2700 
ctacagagtt cttgaagtgg tggectaact 2 760 
tctgcgctct getgaageca gttaccttcg 2820 
aacaaaccac cgctggtagc ggtggttttt 2880 
aaaaaggatc tcaagaagat cctttgatct 2940 
aaaactcacg ttaagggatt ttggtcatga 3000 
ttttaaatta aaaatgaagt tttaaatcaa 3060 
acagcggccg caaatgetaa accactgcag 3120 
gatctcagcg atetgectat ttcgttcgtc 3180 
cactacgatt cgtgagggct taccatcagg 3240 
gcgttcaccg gcccccgatt tgtcagcaat 3300 
aagtggtcct gctactttgt ccgcctccat 3360 
agtaagaagt tcgccagtga gtagtttccg 3420 
ggtatcaege tegtegtteg gtatggcttc 3480 
ggtcacatga tcacccatat tatgaagaaa 3540 
tgtcagaagt aagttggccg cggtgttgtc 3600 
tcttaccgtc atgccatccg taagatgett 3660 
gttttgtgag tagtgtatac ggcgaccaag 3720 
caccgcgcca catagcagta ctttgaaagt 3780 
gaaagactca aggatcttgc cgctattgag 3840 
cagttgatct tcagcatctt ttactttcac 3900 
geaaaatgee gcaaagaagg gaatgagtgc 3960 
cctttttcaa tattattgaa gcatttatca 4020 
aagtaatatt aaggtacggg aggtattgga 4080 
tacatctgtg tgttggtttt ttgtgtgaat 4140 
acaaaacgaa acaaaacaaa ctagcaaaat 4200 
gaacatttct ctaagtaata ttaaggtacg 4260 
ctttattttc attacatctg tgtgttggtt 4320 

4333 
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<210> 90 
<211> 3522 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 90 

lOggcctaactg gccggtacct gagctcgcta 
ggccaagctt ggcaatccgg tactgttggt 
cccgagcaac gcaaacgcat gatcactggg 
aacgtgctgg actccttcat caactactat 
atttttctgc atggtaacgc tgcctccagc 

15gagcccgtgg ctagatgcat catccctgat 
gggaatggct catatcgcct cctggatcac 
ctgaaccttc caaagaaaat catctttgtg 
cactactcct acgagcacca agacaagatc 
gacgtgatcg agtcctggga cgagtggcct 

2 0agcgaagagg gcgagaaaat ggtgcttgag 

agcaagatca tgcggaaact ggagcctgag 
gagaagggcg aggttagacg gcctaccctc 
ggaggcaagc ccgacgtcgt ccagattgtc 
gacgatctgc ctaagatgtt catcgagtcc 
25gagggagcta agaagttccc taacaccgag 
caggaggacg ctccagatga aatgggtaag 
aagaacgagc agtaattcta gagtcggggc 
tacattgatg agtttggaca aaccacaact 
gaaatttgtg atgctattgc tttatttgta 

3 0aacaacaatt gcattcattt tatgtttcag 

agcaagtaaa acctctacaa atgtggtaaa 
gagagccttc aacccagtca gctccttccg 
acttatgact gtcttcttta tcatgcaact 
cttcctcgct cactgactcg ctgcgctcgg 

35actcaaaggc ggtaatacgg ttatccacag 
gagcaaaagg ccagcaaaag gccaggaacc 
ataggctccg cccEcctgac gagcatcaca 
acccgacagg actataaaga taccaggcgt 
ctgttccgac cctgccgctt accggatacc 

40cgctttctca tagctcacgc tgtaggtatc 
tgggctgtgt gcacgaaccc cccgttcagc 
gtcttgagtc caacccggta agacacgact 
ggattagcag agcgaggtat gtaggcggtg 



gcctcgagga 


tatcaagatc 


tggcctcggc 


60 


aaagc caeca 


tggcttccaa 


ggtgtacgac 


120 


n c t c aa t era t 


gggctcgctg 


caagcaaatg 


180 


gatt ccgaga 


agcacgccga 


gaacgccgtg 


240 


tacctataaa 


ggcacgtcgt 


gcctcacatc 


300 


cfccjcifcccjcjcici 


tgggtaagtc 


eggcaagage 


360 


tacaaahacc 


tcaccgcttg gttcgagctg 


420 


ggccacgact 


ggggggcttg 


tctggccttt 


480 


aaoaccatccr 

H %-\ V* w 


tecatgetga 


gagtgtcgtg 


540 


aacatcaaaa 


aggatatege 


cctgatcaag 


600 


aataacttct 


tegtcgagae 


catgctccca 


660 


gagttcgctg 


cctacctgga gecattcaag 


720 


tcctggcctc 


gcgagatccc 


tctcgttaag 


780 


cgcaactaca 


acgcctacct 


tcgggccagc 


840 


qaccctcjacrt 


tcttttccaa 


cgctattgtc 


900 


ttccrtcraaQQ 


tgaagggect 


ccacttcagc 


960 


tacatcaaga 


gcttcgtgga gcgcgtgctg 


1020 


ggccggccgc 


ttcgagcaga 


catgataaga 


1080 


agaatgeagt 


gaaaaaaatg 


ctttatttgt 


1140 


accattataa 


getgeaataa 


acaagttaac 


1200 


gttcaggggg 


aggtgtggga ggttttttaa 


1260 


atcgataagg 


atccgtcgac 


cgatgccctt 


1320 


gtgggcgcgg 


ggcatgacta 


tcgtcgccgc 


1380 


cgtaggacag 


gtgccggcag 


cgctcttccg 


1440 


tegttegget 


gcggcgagcg 


gtatcagctc 


1500 


aatcagggga 


taacgcagga 


aagaacatgt 


1560 


gtaaaaaggc 


cgcgttgctg gcgtttttcc 


1620 


aaaatcgacg 


ctcaagtcag 


aggtggcgaa 


1680 


ttccccctgg 


aagctccctc 


gtgcgctctc 


1740 


tgtccgcctt 


tctcccttcg ggaagcgtgg 


1800 


tcagttcggt 


gtaggtcgtt 


cgctccaagc 


1860 


ccgaccgctg 


cgccttatcc 


ggtaactatc 


1920 


tatcgccact 


ggcagcagcc 


actggtaaca 


1980 


ctacagagtt 


cttgaagtgg 


tggectaact 


2040 
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acggctacac tagaagaaca gtatttggta tctgcgctct gctgaagcca gttaccttcg 2100 

gaaaaagagt tggtagctct tgatccggca aacaaaccac cgctggtagc ggtggttttt 2160 

ttgtttgcaa gcagcagatt acgcgcagaa aaaaaggatc tcaagaagat cctttgatct 2220 

tttctacggg gtctgacgct cagtggaacg aaaactcacg ttaagggatt ttggtcatga 22 80 

5gattatcaaa aaggatcttc acctagatcc ttttaaatta aaaatgaagt tttaaatcaa 2340 

tctaaagtat atatgagtaa acttggtctg acagcggccg caaatgctaa accactgcag 2400 

tggttaccag tgcttgatca gtgaggcacc gatctcagcg atctgcctat ttcgttcgtc 2460 

catagtggcc tgactccccg tcgtgtagat cactacgatt cgtgagggct taccatcagg 2520 

ccccagcgca gcaatgatgc cgcgagagcc gcgttcaccg gcccccgatt tgtcagcaat 2580 

lOgaaccagcca gcagggaggg ccgagcgaag aagtggtcct gctactttgt ccgcctccat 2640 

ccagtctatg agctgctgtc gtgatgctag agtaagaagt tcgccagtga gtagtttccg 27 00 

aagagttgtg gccattgcta ctggcatcgt ggtatcacgc tcgtcgttcg gtatggcttc 2760 

gttcaactct ggttcccagc ggtcaagccg ggtcacatga tcacccatat tatgaagaaa 2820 

tgcagtcagc tccttagggc ctccgatcgt tgtcagaagt aagttggccg cggtgttgtc 2880 

lSgctcatggta atggcagcac tacacaattc tcttaccgtc atgccatccg taagatgctt 294 0 

ttccgtgacc ggcgagtact caaccaagtc gttttgtgag tagtgtatac ggcgaccaag 3 000 

ctgctcttgc ccggcgtcta tacgggacaa caccgcgcca catagcagta ctttgaaagt 3060 

gctcatcatc gggaatcgtt cttcggggcg gaaagactca aggatcttgc cgctattgag 3120 

atccagttcg atatagccca ctcttgcacc cagttgatct tcagcatctt ttactttcac 3180 

2 0cagcgtttcg gggtgtgcaa aaacaggcaa gcaaaatgcc gcaaagaagg gaatgagtgc 324 0 

gacacgaaaa tgttggatgc tcatactcgt cctttttcaa tattattgaa gcatttatca 3300 

gggttactag tacgtctctc aaggataagt aagtaatatt aaggtacggg aggtattgga 3360 

caggccgcaa taaaatatct ttattttcat tacatctgtg tgttggtttt ttgtgtgaat 3420 

cgatagtact aacatacgct ctccatcaaa acaaaacgaa acaaaacaaa ctagcaaaat 3480 

25aggctgtccc cagtgcaagt gcaggtgcca gaacatttct ct 3522 



<210> 91 
<211> 621 
<212> DNA 
30<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

35<400> 91 

gctagcgcca ccatgaccga gtacaagccc 
ccccgcgccg tgcgcaccct ggccgccgcc 
gtggaccccg accgccacat cgagcgcgtg 
gtgggcctgg acatcggcaa ggtgtgggtg 

40accacccccg agagcgtgga ggccggcgcc 
gagctgagcg gcagccgcct ggccgcccag 
cgccccaagg agcccgcctg gttcctggcc 
aagggcctgg gcagcgccgt ggtgctgccc 



accgtgcgcc 


tggccacccg 


cgacgacgtg 


60 


ttcgccgact 


accccgccac 


ccgccacacc 


120 


accgagctgc 


aggagctgtt 


cctgacccgc 


180 


gccgacgacg 


gcgccgccgt 


ggccgtgtgg 


240 


gtgttcgccg 


agatcggccc 


ccgcatggcc 


300 


cagcagatgg 


agggcctgct 


ggccccccac 


360 


accgtgggcg 


tgagccccga 


ccaccagggc 


420 


ggcgtggagg 


ccgccgagcg 


cgccggcgtg 


480 
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cccgccttcc tggagaccag cgccccccgc aacctgccct tctacgagcg cctgggcttc 540 

accgtgaccg ccgacgtgga ggtgcccgag ggcccccgca cctggtgcat gacccgcaag 600 

cccggcgcct aatgatctag a 62 1 

5<210> 92 
<211> 621 
<212> DNA 

<213> Artificial Sequence 

10 

<220> 

<223> A synthetic construct. 
<400> 92 

15gctagcgcca ccatgaccga gtacaagcct accgtgcgcc tggccactcg cgatgatgtg 60 
ccccgcgccg tccgcactct ggccgccgct ttcgccgact accccgctac ccggcacacc 12 0 
gtggaccccg accggcacat cgagcgtgtg acagagttgc aggagctgtt cctgacccgc 180 
gtcgggctgg acatcggcaa ggtgtgggta gccgacgacg gcgcggccgt ggccgtgtgg 240 
actacccccg agagcgttga ggccggcgcc gtgttcgccg agatcggccc ccgaatggcc 300 

20gagctgagcg gcagccgcct ggccgcccag cagcaaatgg agggcctgct tgccccccat 360 
cgtcccaagg agcccgcctg gtttctggcc actgtaggag tgagccccga ccaccagggc 420 
aagggcttgg gcagcgccgt cgtgttgccc ggcgtagagg ccgccgaacg cgccggtgtg 480 
cccgcctttc tggagacaag cgctccgcgt aaccttccat tctacgagcg cctgggcttc 54 0 
accgtgaccg ccgatgtcga ggtgcccgag ggaccccgga cctggtgcat gactcgcaag 600 

25cctggcgcct aatgatctag a 621 

<210> 93 
<211> 621 
<212> DNA 
30<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 
35<400> 93 

gctagcgcca ccatgaccga gtacaagcct accgtgcgcc tggccactcg cgatgatgtg 60 

ccccgcgccg tccgcactct ggccgccgct ttcgccgact accccgctac ccggcacacc 120 

gtggaccccg accggcacat cgagcgtgtg acagagttgc aggagctgtt cctgacccgc 180 

gtcgggctgg acatcggcaa ggtgtgggta gccgacgacg gcgcggccgt ggccgtgtgg 240 

40actacccccg agagcgttga ggccggcgcc gtgttcgccg agatcggccc ccgaatggcc 300 

gagctgagcg gcagccgcct ggccgcccag cagcaaatgg agggcctgct tgccccccat 360 

cgtcccaagg agcctgcctg gtttctggcc actgtaggag tgagccccga ccaccagggc 420 

aagggcttgg gcagcgccgt cgtgttgccc ggcgtagagg ccgccgaacg cgccggtgtg 480 
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cccgcctttc tcgaaacaag cgcaccaaga 
accgtgaccg ccgatgtcga ggtgcccgag 
cctggcgcct aatgatctag a 
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aaccttccat tctacgagcg cctgggcttc 54 0 
ggacctagga cctggtgtat gacacgaaaa 600 

621 



5<210> 94 
<211> 1672 
<212> DNA 

<213> Artificial Sequence 

10<220> 

<223> A synthetic construct. 



<400> 94 

aaagccacca tggaagatgc caaaaacatt 

15gaagatggga ctgctggcga gcaacttcac 
gggacaattg cgttcacgga tgctcacatt 
gagatgtcgg tgcggctggc agaagctatg 
attgtagtgt gcagtgagaa ctcgttgcag 
atcggggtgg ctgtggctcc tgctaacgac 

20atggggatct ctcagcctac agtggtgttt 
aatgtgcaaa agaagctgcc tattatacaa 
taccaggggt ttcagtccat gtacacattt 
gagtacgact tcgtgcccga gtctttcgac 
agctccgggt ctaccgggct gcctaagggt 

25agattctctc atgccaggga cccgatcttt 
ctgtcggtgg tgccctttca tcatgggttt 
tgcgggttta gagtggtgct catgtatagg 
caagattata agattcagtc tgctctgctg 
tctacgctca tagacaagta tgacttgtcc 

3 0cctctgtcta aggaggtagg tgaggctgtg 
caggggtacg ggctaacaga aacaacttct 
aaacccgggg ctgtagggaa agtggtgccc 
accggtaaga cactaggggt gaaccagcgt 
atgtcggggt acgttaacaa ccccgaagct 

35cttcatagcg gcgacattgc ctactgggac 
ctgaagtcgt tgatcaaata caaggggtat 
ctgcttcaac accccaatat cttcgatgct 
ggagagctgc ctgctgctgt agtagtgctt 
atcgtggatt atgtggcttc acaagtgaca 

40tttgtggatg aggtgcctaa agggctcact 
attctcatta aggctaagaa gggtggaaag 



aayaayygyt- 


l*L>yi»l»la>V»V«>l«l» 


ctaccctct t 


60 


ciacLy w l. a. uy ct 


a fr f a a t* a t* cj c 


tcttgtgcca 


120 


y a ciy Lrdy ci^a, 


tcacatacac 


tgagtatttt 


180 


aagcgctatg 


ggctgaatac 


aaaccataga 


240 


ttctttatgc 


ccgtgctggg 


ggctctcttc 


300 


atctacaacg 


agcgagagct 


attcraactcci 


360 


gtgagtaaga 


aagggcttca 


aaagattctc 


420 


aagattatta 


ttatggactc 


h a ana pana r< 
u d ciy d ciy ct 


480 


gtaacctctc 


atctgcctcc 


tggcttcaac 


540 


agggacaaaa 


cgattgctct 


gatcatgaac 


600 


gtagctctgc 


cccatcgaac 


agcttgtgtg 


660 


ggaaaccaga 


tcatccctga 


cactgctatt 


720 


gggatgttca 


caacactggg 


atacctcatt 


780 


tttgaagaag 


aactattcct 


acgctctttg 


840 


gtgccaacac 


tattctcttt 


ttttgctaag 


900 


aacttgcacg 


agattgcttc 


tggcggagca 


960 


gctaagcgct 


ttcatctgcc 


tggtatcaga 


1020 


gctattctga 


ttacaccaga 


gggcgatgac 


1080 


ttttttgaag 


ccaaagtagt 


tgatcttgat 


1140 


ggtgaactgt 


gtgtgcgggg 


ccctatgatt 


1200 


acaaatgctc 


tcatagacaa 


ggacgggtgg 


1260 


gaggatgagc 


atttcttcat 


cgtggacaga 


1320 


caagtagctc 


ctgccgagct 


tgagtccatt 


1380 


ggggtggctg ggctgcctga 


tgatgatgct 


1440 


gagcatggta 


agacaatgac 


agagaaggag 


1500 


acagctaaga 


aactccgagg 


tggcgttgtg 


1560 


ggcaagctgg 


atgccagaaa 


aattcgagag 


1620 


attgctgtgt 


aatagttcta 


ga 


1672 
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<210> 95 
<211> 1166 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> A synthetic construct. 



<400> 95 



1 Ogcggccgcaa 


auyCLdaaCC 


act.ycciyt.yy 


t-f ar r«a rt*h nr 
u tut uay uy<- 


ttgatcagtg 


aggcaccgat 


60 


ctcagcgatc 


tytctatutc 


r~r +- 4- /*■« /"it" f» f"* 3 t~ 

y l. ucyucccti- 


cty ^-y y L-y d 


ctccccgtcg 


tgtagattac 


120 


tacgattcgt 


y ay y y c 1. t die 


cd L.cdyyL* en- 


uciy ^*y cciy \_>ct 


atgatgeege 


gagagecgeg 


180 


ttcaccggca 


ccggatttgt 


cage aatgaa 


ccagccagca 


gggagggccg 


agegaagaag 




tggtcctgct 


actttgtccg 


cctccatcca 


gtctatgagc 


tgctgtcgtg 


atgctagagt 


300 


15gagaagttcg 


ccagtgagta 


gtttccgaag 


agttgtggcc 


attgetactg gcatcgtggt 


360 


atcacgctcg 


tcgttcggta 


tggcttcgtt 


cagctccggt 


tcccagcggt 


caagcegggt 


420 


cacatgatca 


cccatgttgt 


geaaaaatge 


ggtcagctcc 


ttagggcetc 


cgatcgttgt 


480 


cagaagtaag 


ttggccgcgg 


tattatcget 


catggtaatg 


gcagcactac 


acaattctct 


540 


taccgtcatg 


ccatccgtaa 


gatgetttte 


cgtgaccggc 


gagtactcaa 


ecaagtegtt 


600 


20ttgtgagtag 


tgtatacggc 


gaccaagctg 


ctcttgcccg 


gegtctatae 


gggacaacac 


660 


cgcgccacat 


agcagtactt 


tgaaagtgct 


catcateggg 


aatcgttctt 


eggggeggaa 


720 


agactcaagg 


atcttgccgc 


tattgagatc 


cagttcgata 


tagcccactc 


ttgcacccag 


780 


ttgatcttca 


gcatctttta 


ctttcaccag 


cgtttcgggg 


tgtgcaaaaa 


caggcaagca 


840 


aaatgccgca 


aagaagggaa 


tgagtgcgac 


acgaaaatgt 


tggatgetea 


tactcttcct 


900 


25ttttcaatat 


gtttgcagca 


tttgtcaggg 


ttactagtac 


gtctctcttg 


agagaccgeg 


960 


atcgccacca 


tgtctaggta 


ggtagtaaac 


gaaagggctt 


aaaggcctaa gtggccctcg 


1020 


agtccagcct 


tgagttggtt 


gagtccaagt 


cacgtttgga 


gatctggtac 


ettacgegta 


1080 


tgagctctac 


gtagctagcg 


gcctcggcgg 


ccgaattctt 


gcgatctaag 


cttggcaatc 


1140 


cggtactgtt 


ggtaaagcca 


ccatgg 








1166 
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<210> 96 
<211> 1166 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> A synthetic construct. 



<400> 96 

40gcggccgcaa atgetaaace actgcagtgg 
ctcagcgatc tgtctatttc gttegtccat 
tacgattcgt gagggcttac catcaggccc 
ttcaccggcc cccgatttgt cagcaatgaa 



ttaccagtgc ttgatcagtg aggcaccgat 60 
agtggcctga ctccccgtcg tgtagattac 120 
cagcgcagca atgatgeege gagagecgeg 180 
ccagccagca gggagggccg agegaagaag 240 
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tggtCCtgct 


-3 /-» ^ 4- | ■ /™» 

actucytccg 




gtctatgagc 


tgctgtcgtg 


atgctagagt 


300 


aagaagttcg 


ccagtgag ca 


fr+* I - 1~ (Tf z> a rj 
y u u l. v_«y ciciy 


agttgtggcc 


attgctactg gcatcgtggt 


360 


atcacgctcg 


LcyLCcggta 


uyyui. i_L.y u l. 


caactccggt 


tcccagcggt 


caaaccQQQt 


420 


^ /^t ^ 

cacacgatcd 


cccatytty u 


y clclciclg. 1— y v— 


ggtcagctcc 


ttagggcctc 


cgatcgttgt 


480 


5cagaagtaag 


ttggccgcgg 


t yt t.g u cy t. u 


catggtaatg gcagcactac 


acaattctct 


540 


taccgtcatg 


ccat-ccguaa 


yatycttLtc 


cgtgaccggc 


gagtactcaa 


ccaagtcgtt 


600 


ttgtgagtag 


uguauacggc 


y ctot^ciciy l~ L.y 


ctcttgcccg gcgtctatac 


gggacaacac 


660 


cgcgccacat 


agcag cacc u 


Ly ctdcty uy 1- l. 


catcatcggg 


aatcgttctt 


cggggcggaa 


720 


agactcaagg 


atcttgccgc 


tattgagatc 


cagttcgata 


tagcccactc 


ttgcacccag 


/ ou 


lOttgatcttca 


gcatctttta 


ctttcaccag 


cgtttcgggg 


tgtgcaaaaa 


caggcaagca 


840 


aaatgccgca 


aagaagggaa 


tgagtgcgac 


acgaaaatgt 


tggatgctca 


tactcttcct 


900 


ttttcaatat 


gtttgcagca 


tttgtcaggg 


ttactagtac 


gtctctcttg 


agagaccgcg 


960 


atcgccacca 


tgtctaggta 


ggtagtaaac 


gaaagggctt 


aaaggcctaa gtggccctcg 


1020 


agtccagcct 


tgagttggtt 


gagtccaagt 


cacgtttgga 


gatctggtac 


cttacgcgta 


1080 


15tgagctctac 


gtagctagcg 


gcctcggcgg 


ccgaattctt 


gcgttcgaag 


cttggcaatc 


1140 


cggtactgtt 


ggtaaagcca 


ccatgg 








1166 



<210> 97 
<211> 1166 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A synthetic construct. 

25 

<400> 97 

gcggccgcaa atgctaaacc 
ctcagcgatc tgcctatttc 
tacgattcgt gagggcttac 

3 0ttcaccggcc cccgatttgt 
tggtcctgct actttgtccg 
aagaagttcg ccagtgagta 
atcacgctcg tcgttcggta 
cacatgatca cccatgttgt 

3 5cagaagtaag ttggccgcgg 
taccgtcatg ccatccgtaa 
ttgtgagtag tgtatacggc 
cgcgccacat agcagtactt 
agactcaagg atcttgccgc 

40ttgatcttca gcatctttta 
aaatgccgca aagaagggaa 
ttttcaatat gtttgcagca 
atcqccacca tqtctaggta 



actgcagtgg ttaccagtgc 
gttcgtccat agtggcctga 
catcaggccc cagcgcagca 
cagcaatgaa ccagccagca 
cctccatcca gtctatgagc 
gtttccgaag agttgtggcc 
tggcttcgtt caactctggt 
gcaaaaatgc ggtcagctcc 
tgttgtcgct catggtaatg 
gatgcttttc cgtgaccggc 
gaccaagctg ctcttgcccg 
tgaaagtgct catcatcggg 
tattgagatc cagttcgata 
ctttcaccag cgtttcgggg 
tgagtgcgac acgaaaatgt 
tttgtcaggg ttactagtac 
ggtagtaaac gaaagggctt 



ttgatcagtg aggcaccgat 60 
ctccccgtcg tgtagatcac 120 
atgatgccgc gagagccgcg 180 
gggagggccg agcgaagaag 24 0 
tgctgtcgtg atgctagagt 300 
attgctactg gcatcgtggt 360 
tcccagcggt caagccgggt 420 
ttagggcctc cgatcgttgt 4 80 
gcagcactac acaattctct 540 
gagtactcaa ccaagtcgtt 600 
gcgtctatac gggacaacac 660 
aatcgttctt cggggcggaa 720 
tagcccactc ttgcacccag 780 
tgtgcaaaaa caggcaagca 840 
tggatgctca tactcttcct 900 
gtctctcttg agagaccgcg 960 
aaaggcctaa gtggccctcg 1020 
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agtccagcct tgagttggtt gagtccaagt cacgtttgga gatctggtac cttacgcgta 1080 

tgagctctac gtagctagcg gcctcggcgg ccgaattctt gcgttcgaag cttggcaatc 1140 

cggtactgtt ggtaaagcca ccatgg 1166 
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* There is 85,82% identity of .1147:295 in 

3095 nt overlap (total 4818 nt) with SEQ : , 

ID NO: 89 (4333 nt) of the present 

application * 

abstract 
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DATABASE EMBL 15 May 2001 (2001-05-15), 

ZHUANG, Y. ET AL.: "Co-reporter vector 

phRG-B, complete sequence" 

XP002371237 - . . 

retrieved from EBI 

Database accession no. AF362550 

* There is 98,82% identity of AF362550 in 

2375 nt overlap (total 4101 nt) with SEQ 

ID NO: 90 (3522 nt) of the present 

application * 

abstract 
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Box II Observations where certain claims were found unsearchable (Continuation of item 2 of first sheet) 

This International SearchReport has not been established In respect of certain claims under Article 17(2)(a) for the following reasons: 
1. Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



□ ciaimsNos, 1~10, 12-14, 16-30, 32-46, 48, 53, 55-62, 68-69 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 

see FURTHER INFORMATION sheet PCI/ISA/210 



3, Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 

Box III Observations where unity of invention is lacking (Continuation of item 3 of first sheet) 

This International Searching Authority found multiple inventions In this International application, as follows: 

see additional sheet 



1. 



□ As all required additional search fees were, timely paid by the applicant, this International Search Report covers all 
searchable claims 



2. As all searchable claims could be searched without effort justifying an additional fee, this Authority did hot invite payment 
of any additional fee. 

(' 

3. TT1 As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
LAJ covers only those claims for which fees were paid, specifically claims Nos.: 

11 and 15 (partially), 47, 49 and 50 (partially), 63-67 (partially) 



4. Q No required additional search fees were timely paid by the applicant Consequently, this International Search Report Is 
restricted to the Invention first mentioned In the claims; it Is covered by claims Nos.: 



Remark on Protest The additional search fees were accompanied by the applicant's protest. 

[Y] No protest accompanied the payment of additional search fees. 
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Continuation of Box II. 2 

Claims Nos.: 1-10, 12-14, 16-30, 32-46, 48, 53, 55-62, 68-69 



The present application contains 69 claims, of which 7 claims are 
independent. They are drafted in such a way that the claims as a whole 
are not in compliance with the provisions of clarity and conciseness of 
Article 6 PCT, as they erect a smoke screen in front of the skilled 
reader when assessing the intended scope of protection. In view of the 
fact that the starting (parent) nucleic acid sequences are not defined 
in most claims, it is impossible for the skilled reader to determine the 
subject-matter for which protection is sought. The non-compliance with 
the substantive provisions of the PCT is to such an extent, that a 
meaningful search of the claims identified above was not possible. 



The applicant's attention is drawn to the fact that claims, relating to 
inventions in respect of which no international search report has been 
established need not be the subject of an international preliminary 
examination (Rule 66.1(e) PCT). The applicant is advised that the EPO 
policy when acting as an International Preliminary Examining Authority is 
normally not to carry out a preliminary examination on matter which has 
not been searched. This is the case irrespective of whether or not the 
claims are amended following receipt of the search report or during any 
Chapter II procedure. If the application proceeds into the regional phase 
before the EPO, the applicant is reminded that a search may be carried 
out during examination before the EPO (see EPO Guideline C-VI, 8.5), 
should the problems which led to the Article 17(2) declaration be 
overcome. 



International Application No. PCT/US2005 /033218 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 

This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

Inventions 1-20 : claims 11 and 15 (partially) 

The subject-matter of this group of different inventions 
. comprises an isolated nucleic acid molecule comprising a 
synthetic nucleotide sequence having a coding region for a 
selectable polypeptide, wherein the synthetic nucleotide 
sequence has 90% or less nucleic acid sequence identity to a 
parent nucleotide encoding a corresponding selectable 
polypeptide, wherein the nucleotide sequence encodes a 
selectable polypeptide with at least 85% amino add sequence 
identity to the corresponding selectable polypeptide encoded 
by the the parent nucleotide sequence - wherein the 
synthetic nucleotide sequence comprises an open reading 
frame in SEQ ID NO: 4 to SEQ ID NO: 84 as claimed in claims 
11 and 15. 



Invention 21: claim 31 (partially) 

The subject-matter of this Invention comprises an isolated 
nucleic acid sequence encoding a firefly lucif erase, wherein 
the synthetic nucleotide sequence has 80% or less nucleic 
acid sequence identity to a parent nucleotide having SEQ ID 
NO: 43 or 85% or less nucleic acid sequence identity to a 
parent nucleic acid sequence having SEQ ID NO: 14 which 
encodes a firefly luciferase, wherein the nucleotide 
sequence encodes a firefly luciferase with at least 85% 
amino acid sequence identity to the corresponding luciferase 
encoded by the the parent nucleotide sequence, wherein the 
synthetic nucleotide sequence comprises an open reading 
frame in SEQ ID NO: 21-23. 



Invention 23: claims 47, 49 and 50 (partially) 

A plasmid comprising SEQ ID NO: 74 which comprises an open 
reading frame with less than 90% nucleic acid sequence 
identity to 41 which confers resistance to ampicillin. 



Inventions 24-46: claims 51-52 (partially) 

A polynucleotide which hybridizes under stringent 
hybridization conditions to SEQ ID NO: 4 to SEQ ID NO: 23 as 
claimed in claim 51 and encodes a selectable polypeptide or 
a firefly luciferase. 



Invention 47: claim 54 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



An isolated nucleic acid molecule comprising a synthetic 
nucleotide sequence which does not code for a desirable 
peptide or polypeptide but includes sequences which inhibit 
transcription and/or translation wherein the synthetic 
nucleotide sequence has SEQ ID NO: 49. 



Inventions 48-49: claims 63-67 (partially) 

A plasmid which includes a sequence including SEQ ID NO : 
89 or SEQ ID NO: 90. 

The search was limited to matter related to invention 1 and 
inventions 23 $ '48 and 49 as requested by the applicant in 
his letter dated 13.02.2006. 
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